Elasticsearch

Basic concepts

Elastic index is collection of shards, application/API talks to this index and Elastic routes requests to the appropriate shards

Documents (data entries in json format) - each have an unique id and type

Types (schema) - shared mappings by similar documents (log entry, article, etc)

Indicies - an index make searching over data faster and less power consumption - contain inverted indicies*

Shards (=lucene index) - documents are hashed to a particular shard - each shard could be on different node in a cluster - each shard is self-cointained lucene index of its own - primary is just one and handle RW requests - replicas handle RO requests and the amount of them can be large -

Performance notes Search is paralel across the nodes => bigger amount of smaller shards is better.

Inverted indicies ~ split document per searchable word to hash table and add count to each word
term frequency = how often a term appears in document been searched
document frequency = how often a term appears in all documents
term freq. / Doc. freq. = measure the relevance of term in a document => how unique/relevant this term is for this document

Zalohovani do S3:

Kdyz jsem mel pro nody clusteru sdilene uloziste na pluginy, plugin nefungoval, cili nakonec ma kazdy node vlastni storrage na pluginy a instaluji to pro kazdy node.

Instalace pluginu (potreba do vsech node)

docker exec es01 /bin/bash -c "bin/elasticsearch-plugin install repository-s3 -v --batch"

docker exec es01 /bin/bash -c "echo 'AWSKEYAWSKEYAWSKEY' | elasticsearch-keystore add s3.client.default.access_key -fx"
docker exec es01 /bin/bash -c "echo 'AWS_SECRET1AWS_SECRET1AWS_SECRET1' | elasticsearch-keystore add s3.client.default.secret_key -fx"
docker exec es01 /bin/bash -c "elasticsearch-keystore list"

Tady udelame radeji restart clustru

Ted poslu dva dotazy prez konzoli: Konfigurace prav v AWS je jasna, lze ji najit pripadne na internetu

POST /_nodes/reload_secure_settings
PUT _snapshot/netflow_backup
{
  "type": "s3",
  "settings": {
    "bucket": "netflow-prague",
    "base_path": "backup-elk"
  }
}

Commands

GET /_search
GET /_search?q=some_title.   Elastcsearch lite

Hledani dle datumu

GET /restore_netflow/_search
{
  "query": {
    "bool" : {
      "filter" : {
       "script" : {
          "script" : {
            "source": "doc['@timestamp'].value.getYear() == 2020",
            "lang": "painless"
          }
        }
      }
    }
  }
}

Reindexace dat z jednoho indexu do druheho dle roku

POST /_reindex?wait_for_completion=false
{
  "source": {
    "index": "restore_netflow",
    "query": {
    "bool" : {
      "filter" : {
       "script" : {
          "script" : {
            "source": "doc['@timestamp'].value.getYear() == 2019",
            "lang": "painless"
          }
        }
      }
    }
    }
  },
  "dest": {
    "index": "netflow-2019"
  }
}

GET /_mapping

Change count of shards (more shards => faster reading) primary shards can NOT be changed later

PUT /indexname
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 2
  }
}

result will be 2 primary shards plus 4 replicas (each primary will have 2 replicas) = > 6 shards

Index template with autotimestamp

  "indexed_at" : {
    "description" : "Adds indexed_at timestamp to documents",
    "processors" : [
      {
        "set" : {
          "field" : "@timestamp",
          "value" : "{{_ingest.timestamp}}"
        }
      }
    ]
  },

{
  "index": {
    "number_of_shards": "1",
    "default_pipeline": "indexed_at"
  }
}

Backuping

At first we have to configure snapshots path in elasticsearch.yaml file

path.repo: ["/mnt/elk_backup"]

Than we should be able to set this path for storing backups

curl -X PUT localhost:9200/_snapshot/fs_backup \
  -H 'Content-Type: application/json; charset=utf-8' \
  --data-binary @- <<EOF
{
  "type": "fs",
  "settings": {
    "location": "/mnt/elk_backup",
    "compress": true
  }
}
EOF

We could check our setting

curl -X GET localhost:9200/_snapshot/fs_backup

Create snapshot

curl -X PUT localhost:9200/_snapshot/fs_backup/full_backup-20190321

The snapshot process could be monitored

# State of snapshots in specified repository
curl -X GET "localhost:9200/_snapshot/my_repository/_current?pretty"

# Get status of all snapshots
curl -X GET "localhost:9200/_snapshot/_status?pretty"

Kopf plugin - stav clusteru
Elastalert - alertovac k elastiku
Postman na hrani s api
Insomnia rest client
Beats - male jednoucelove sondy

Prikazy:

_search
_mapping
_search?q=iphone. q=title. Elastcsearch lite