Elasticsearch
Basic concepts
Elastic index is collection of shards, application/API talks to this index and Elastic routes requests to the appropriate shards
Documents (data entries in json format) - each have an unique id and type
Types (schema) - shared mappings by similar documents (log entry, article, etc)
Indicies - an index make searching over data faster and less power consumption - contain inverted indicies*
Shards (=lucene index) - documents are hashed to a particular shard - each shard could be on different node in a cluster - each shard is self-cointained lucene index of its own - primary is just one and handle RW requests - replicas handle RO requests and the amount of them can be large -
Performance notes Search is paralel across the nodes => bigger amount of smaller shards is better.
- Inverted indicies ~ split document per searchable word to hash table and add count to each word
- term frequency = how often a term appears in document been searched
- document frequency = how often a term appears in all documents
- term freq. / Doc. freq. = measure the relevance of term in a document => how unique/relevant this term is for this document
Zalohovani do S3:
Kdyz jsem mel pro nody clusteru sdilene uloziste na pluginy, plugin nefungoval, cili nakonec ma kazdy node vlastni storrage na pluginy a instaluji to pro kazdy node.
Instalace pluginu (potreba do vsech node)
docker exec es01 /bin/bash -c "bin/elasticsearch-plugin install repository-s3 -v --batch"
docker exec es01 /bin/bash -c "echo 'AWSKEYAWSKEYAWSKEY' | elasticsearch-keystore add s3.client.default.access_key -fx"
docker exec es01 /bin/bash -c "echo 'AWS_SECRET1AWS_SECRET1AWS_SECRET1' | elasticsearch-keystore add s3.client.default.secret_key -fx"
docker exec es01 /bin/bash -c "elasticsearch-keystore list"
Tady udelame radeji restart clustru
Ted poslu dva dotazy prez konzoli: Konfigurace prav v AWS je jasna, lze ji najit pripadne na internetu
POST /_nodes/reload_secure_settings
PUT _snapshot/netflow_backup
{
"type": "s3",
"settings": {
"bucket": "netflow-prague",
"base_path": "backup-elk"
}
}
Commands
GET /_search
GET /_search?q=some_title. Elastcsearch lite
Hledani dle datumu
GET /restore_netflow/_search
{
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"source": "doc['@timestamp'].value.getYear() == 2020",
"lang": "painless"
}
}
}
}
}
}
Reindexace dat z jednoho indexu do druheho dle roku
POST /_reindex?wait_for_completion=false
{
"source": {
"index": "restore_netflow",
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"source": "doc['@timestamp'].value.getYear() == 2019",
"lang": "painless"
}
}
}
}
}
},
"dest": {
"index": "netflow-2019"
}
}
GET /_mapping
Change count of shards (more shards => faster reading) primary shards can NOT be changed later
PUT /indexname
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
}
}
result will be 2 primary shards plus 4 replicas (each primary will have 2 replicas) = > 6 shards
Index template with autotimestamp
"indexed_at" : {
"description" : "Adds indexed_at timestamp to documents",
"processors" : [
{
"set" : {
"field" : "@timestamp",
"value" : "{{_ingest.timestamp}}"
}
}
]
},
{
"index": {
"number_of_shards": "1",
"default_pipeline": "indexed_at"
}
}
Backuping
- At first we have to configure snapshots path in elasticsearch.yaml file
path.repo: ["/mnt/elk_backup"]
- Than we should be able to set this path for storing backups
curl -X PUT localhost:9200/_snapshot/fs_backup \
-H 'Content-Type: application/json; charset=utf-8' \
--data-binary @- <<EOF
{
"type": "fs",
"settings": {
"location": "/mnt/elk_backup",
"compress": true
}
}
EOF
-
We could check our setting
curl -X GET localhost:9200/_snapshot/fs_backup -
Create snapshot
curl -X PUT localhost:9200/_snapshot/fs_backup/full_backup-20190321
The snapshot process could be monitored
# State of snapshots in specified repository
curl -X GET "localhost:9200/_snapshot/my_repository/_current?pretty"
# Get status of all snapshots
curl -X GET "localhost:9200/_snapshot/_status?pretty"
- Kopf plugin - stav clusteru
- Elastalert - alertovac k elastiku
- Postman na hrani s api
- Insomnia rest client
- Beats - male jednoucelove sondy
Prikazy:
- _search
- _mapping
- _search?q=iphone. q=title. Elastcsearch lite