In Elasticsearch and OpenSearch, caching plays a fundamental role in performance tuning.

Speeding up queries can be done by using caches, essentially storing the outcomes of frequent search requests in memory that is easily accessible and fast to respond. This means that when a query is repeated, it can be served directly from the cache, allowing for significant speed ups and optimal use of resources. Efficient use of caching will do wonders to search performance.

But when caches are not used correctly, performance takes a hit. Elasticsearch may end up working harder due to cache misuse, leading to a vicious cycle of queries entering and leaving the cache. This results in elevated CPU usage and JVM memory issues for example, and degraded performance overall. Thus, using Elasticsearch caches the right way isn’t just about speed, but also about maintaining a robust, stable and efficient search infrastructure.

The various caches used by Elasticsearch, such as query cache, request cache and page cache (which is maintained by the operating system), can dramatically reduce latency for those queries that are executed time and time again. In this post we will focus on writing cacheable queries, in a way that optimizes memory consumption and, hopefully, optimizes search latency; and how to use the Query Cache and Request Cache correctly - including avoiding pitfalls and proper monitoring.

The Elasticsearch Query Cache

The Query Cache in Elasticsearch and OpenSearch is a sophisticated mechanism designed to store reusable components of queries to speed up searches. It is a shared resource across on a data node level, acting as a repository for cached query results.

More granular than other caching mechanisms, the query cache focuses on parts of queries that are used over and over. It’s especially adept at caching elements such as time range filters, enabling lightning-quick assembly of subsequent searches that use these cached entries. By doing so, it avoids the need to go through the entire data set each time, instead quickly piecing together the right information to answer a query.

Some queries are eligible for caching, such as range and geo queries, but some are considered to be "cheap" to run again and will not participate in caching. This is a feature by the search engine to ensure data retrieval is optimized but doesn't take a big toll on memory consumption.

How Query Cache Works

Elasticsearch’s query cache works by caching frequently accessed query components, allowing them to be reused across different searches, optimizing repeated data retrieval. It operates like a smart library system, where common filters like timestamp ranges are kept handy and ready to be pulled off the shelf whenever needed.

By default, the query cache has the following caching policy:

  • Can store up to 10,000 queries
  • Uses no more than 10% of the total java heap space
  • Employs an LRU eviction policy to make room for new entries by discarding the least recently used ones
  • Cached results are automatically invalidated upon shard refresh, mapping updates or when writing data, ensuring that the data remains fresh. That is why the effects of the caching may not be noticeable immediately.

These cached query components are represented and stored using bit sets, which allow for efficient reuse across different queries while keeping memory usage low.

Avoiding Common Pitfalls

To maximize the efficiency of the query cache, it’s essential to avoid certain common mistakes that will reduce cache hit rates, increase cache eviction rate, cause JVM heap misuse and elevated garbage collection operations, and eventually degrade overall performance.

These are the typical mistakes that should be avoided to indeed leverage caching:

  1. Always use the filter context. Caching for queries is only enabled when they run in a boolean query under a filter clause, as opposed to a must clause. Putting queries in a filter clause will both disable scoring (thus speeding up the query too) and make them cacheable. Use filter clauses instead of must or should within a bool query whenever possible as shown below:
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "type": "event" } },
      ]
    }
  }
}
  1. Round timestamps to the nearest day, hour, or minute. Using unique timestamps, such as 12:14:55.012Z, will make it unlikely that any query will be cached. Instead, round the timestamp to the nearest day, hour, or minute to give it a chance to be repeated and then cached. This way, the query component will be reusable. Rounding timestamps is equally important for queries (which use query cache) and aggregations (which use request cache as we discuss below).

  2. Also round date math expressions. Since running now() on code or in query itself will both produce non-deterministic value, you should not use it without rounding. For example, if you just use now in range queries on a timestamp field, your query is unlikely to be reused even if the same query is run over and over again. Instead, use now/h or now/d to round the generated timestamp to the nearest hour or day as shown below:

{
  "query": {
    "bool": {
      "filter": [
        { "term": { "type": "event" } },
        { "range": { "timestamp": { "gte": "now-1d/h" } } }
      ]
    }
  }
}
  1. Avoid scripts. Scripts that are used for filtering or scoring will often execute a full-table scan and will therefore be slower and not cacheable. In many cases, script queries can be translated to simple native queries that leverage the inverted index. Thus, our recommendation is to try and avoid them completely. There are many occasions where this is possible to do.

Shard-Level Request Cache

Caching in Elasticsearch and OpenSearch doesn’t stop at the query cache; it also employs a shard-level request cache to boost search requests with aggregation requests - mostly used in Kibana and other Dashboard applications. The Request Cache stores full search request responses directly on each shard and will serve responses directly from cache data, allowing for instant results on repeated queries of this type.

Aggregation queries, which can be particularly resource-intensive, benefit greatly from the request cache. For instance, Kibana visualizations, which often involve aggregating data from multiple indices, can see a dramatic speed boost thanks to this caching layer. Similar queries are often sent from applications drawing graphs in their analytics section.

Configuring the request cache is straightforward, with default settings allocating up to 1% of the total heap space to the cache. It’s also possible to adjust settings to determine the specific number of requests or the time window for which responses should be cached.

Monitoring Cache Performance

Optimal cache usage will lead to optimal performance, so you need to make sure that your cache is neither over-provisioned (wasting resources) nor plateauing with many evictions per second at a constant rate. We'd also want to avoid cache thrashing, which occurs when the cache is filled with unnecessary or seldom-used entries. To check on these issues, you need to make sure you are using an effective monitoring tool which allows the important cache metrics to be closely monitored so you could adjust your cluster and index settings accordingly..

By analyzing query cache usage, eviction rate, and hit/miss ratio it’s possible to measure the impact of the query cache on query response times and computational overhead. When you notice a decline in cache hit rates or increased memory consumption, it might be time to revisit your caching policies.

There are several tools which you can use for monitoring Elasticsearch. One such monitoring solution is Pulse, which not only monitors cluster health but also provides precise, actionable recommendations for improvement. You can learn more about Pulse and how it can help with monitoring and cacheable queries here.

Summary

The correct use of caches, and making sure to write cacheable queries is an essential step in performance tuning of OpenSearch or Elasticsearch clusters. If you don’t write cacheable queries, you risk slowing down queries and degrading your clusters’ overall performance.

Effective use of Elasticsearch’s request and query caches is important, thus monitoring and optimizing cache performance, along with properly configuring cache settings, are crucial to reduce cost and achieve peak performance. Monitor cache usage, and avoid the pitfalls listed here, to ensure optimal usage of the available caches.