What query limits exist for Elasticsearch and OpenSearch, and what you should know about them?

Elasticsearch and OpenSearch are powerful search and analytics engines, but some query limits do exist. Some limits are imposed to prevent performance degradation and resource exhaustion. Other limits are just inherent to the way the software works. Understanding and configuring these limits is essential for maintaining efficient and stable operations.

1. Results Size Limit

By default, Elasticsearch and OpenSearch limit the number of documents returned by a search query. The size parameter controls the maximum number of results, with a default to 10 documents per query. Setting size too high can increase memory usage and require more CPU and disk work to retrieve and score the documents.

Elasticsearch has a limit of 10,000 results for a query, and trying to get beyond that will result in a “Result window is too large“ error.

For deep paging, consider using search_after or scrolling.

2. Max Clause Count

Lucene, the engine that is the heart of Elasticsearch and OpenSearch, has a built-in protection that helps with preventing huge queries from running and getting the entire system stuck. Many query types from the Expensive queries list are in fact syntactic sugar that is then being rewritten to monster queries with many clauses.

The max_clause_count protection helps to mitigate that risk. It will automatically block queries with more than 4,096 clauses, which is a good thing.

If you decide to change it anyway, the configuration to look for is this:

indices.query.bool.max_clause_count:4096

3. Field Data Limits

When it comes to limits on field data and aggregation memory usage, there's no difference between OpenSearch vs Elasticsearch since they both have limits. Large aggregations, especially on text fields in case it’s enabled, can cause high memory consumption. You can set limits on heap usage and the number of documents scanned by each aggregation through settings like indices.fielddata.cache.size.

This limitation affects mostly larger systems, and queries on fields with very high cardinality; or aggregation requests for an insane amount of buckets like a minute-resolution date histogram for a year of data.

4. Joins

Complex queries, especially those involving nested structures or multiple joins (in the form of nested or parent-child queries), can put significant load on the cluster. Elasticsearch and OpenSearch may reject such queries if they exceed certain thresholds, like the max_clause_count, which limits the number of clauses in a query.

But oftentimes the issue is with bringing the relational mindset to Elasticsearch in the first place, and trying to force it on a document store. For example, creating a star schema in Elasticsearch is impossible. Also Joins, although possible in some level using nested or parent-child (now called join field), are not supported.

Elasticsearch can be used as a single source of truth and your main data store, but you will have to design your data models accordingly to avoid hitting those limits.

5. Query Throughput

Each cluster has a maximum query throughput that it can handle. It’s not a strict number, or something that is easy to calculate, but it’s very simple to understand.

Assuming every query takes 1 second, and the system has 3 data nodes each with 13 threads serving search requests, then your system can handle up to 39 queries per second. Make the query faster, or add more nodes (or cores) and you’ve increased the search throughput.

Once the cluster sustains more search requests that it can handle, there are two protection mechanisms that get activated.

The first is the search queue - the queries will be queued up until a thread becomes available to run them. When the queue fills up, the system will just reject the request.

Both platforms also use circuit breakers to prevent excessive resource usage, such as memory or heap overflow. If a query or operation exceeds defined limits, a circuit breaker can trigger and abort the operation to maintain system stability. That is the other protection mechanism.

Avoid hitting those limits by monitoring your queries and trying to optimize them as much as possible. Also, never send a request to Elasticsearch without setting a sane timeout, and respect requests rejected with HTTP 429 by not retrying them right away.