ElasticSearch cheat sheet
Deleting and entire index is cheaper than updating (deleting documents) on existing index.
When searches must be limited to a certain user, route all the documents for that user to the same shard(use same shard key for all index, ex: userId).
When batch (re-)indexing, turn off refresh and enable after finished.
When you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space.
Prefer to use filter instead of query and query only for scoring since query won’t be cached.
Prefer to use auto generated id by ElasticSearch for the better index performance, when you use your own id, then ElasticSearch should look up entire ids within the shard.
The value of the
_id field is also accessible in aggregations or for sorting, but doing so is discouraged as it requires to load a lot of data in memory. In case sorting or aggregating on the
_id field is required, it is advised to duplicate the content of the
_id field in another field that has
Serialize Json in a canonical mode, if not your shard request cache cannot be hit since it use whole Json body as cache key.
Indexing a document with 100 nested fields actually indexes 101 documents as each nested document is indexed as a separate document.
Fast retrieval is important.
term query searches on
keyword fields are often faster than
term searches on numeric fields.
Elasticsearch from the Bottom Up, Part 1
In this article series, we look at Elasticsearch from a new perspective. We'll start at the 'bottom' (or close enough!)…
Elasticsearch from the Bottom Up
This talk will teach you about Elasticsearch and Lucene's architecture and give you tidbits of highly relevant…
Aggregations with Nested Documents in Elasticsearch
A nested type is a specialized version of the object datatype that allows arrays of objects to be indexed and queried…