ElasticSearch cheat sheet
--
Deleting and entire index is cheaper than updating (deleting documents) on existing index.
When searches must be limited to a certain user, route all the documents for that user to the same shard(use same shard key for all index, ex: userId).
When batch (re-)indexing, turn off refresh and enable after finished.
When you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space.
Prefer to use filter instead of query and query only for scoring since query won’t be cached.
Prefer to use auto generated id by ElasticSearch for the better index performance, when you use your own id, then ElasticSearch should look up entire ids within the shard.
The value of the _id
field is also accessible in aggregations or for sorting, but doing so is discouraged as it requires to load a lot of data in memory. In case sorting or aggregating on the _id
field is required, it is advised to duplicate the content of the _id
field in another field that has doc_values
enabled.
Serialize Json in a canonical mode, if not your shard request cache cannot be hit since it use whole Json body as cache key.
Indexing a document with 100 nested fields actually indexes 101 documents as each nested document is indexed as a separate document.
Fast retrieval is important. term
query searches on keyword
fields are often faster than term
searches on numeric fields.
references :