Nested Fields in Elasticsearch: Why and How to Avoid Them

The Nested field type in Elasticsearch and OpenSearch is not as innocent as one would think. Let’s look at what they are used for, the challenges with them, and possible alternatives.

Elasticsearch is a document-oriented data store. Trying to use relational models like in-use in RDBMS systems like Postgresql and MySQL, with Elasticsearch is prone to failure.

Yet, Elasticsearch allows you to model complex data structures using nested fields, which are useful for preserving relationships within arrays of objects. Unlike standard object fields, nested fields store each object in an array as a separate hidden document, enabling precise querying. However, this approach comes with significant downsides.

Let’s review the common issues and challenges with nested fields, then look at solutions and alternatives given the common use-cases where they are being used.

What are Nested Fields?

Nested fields are designed to allow querying on lists of objects that are internalized on the actual document (or object) which can be considered a parent of that list. For example, a list of key / value pairs that serve as properties for a product on an eCommerce website:

PUT products/_doc/1
{
  "title": "Product 1",
  "attributes": [{
    "attribute": "color",
    "value": "blue"
  }, {
    "attribute": "size",
    "value": "m"
  }]
}

In Elasticsearch and OpenSearch, this is completely equivalent to the following, because indexes have no notion of structure or sub-documents:

{
  "title": "Product 1",
  "attributes.attribute": ["color", "size"],
  "attributes.value": ["blue", "m"]
}

So, you could query for color:blue and get this product in the result set; but also size:blue would return it, which obviously doesn’t make any sense. To solve this problem, the Nested field type helps by keeping this key-value pairing also in the index level, not without some trade-offs.

Nested Fields Shortcomings

1. Performance Overhead

Since each nested object is treated as an independent document internally, queries on nested fields require expensive join operations, leading to increased query execution time and higher memory consumption.

2. Complex Queries

Using nested fields means you must use nested queries, which are more cumbersome to write and understand. Unlike simple object fields, which allow direct queries, nested structures demand a specialized syntax, adding unnecessary complexity.

3. Indexing Costs

Nested fields can bloat your index size, as each nested object is stored separately. This leads to increased storage requirements and slower indexing performance, especially when dealing with large datasets.

4. Limited Aggregations

Aggregating data within nested fields is more complicated. Since nested documents exist separately, standard aggregations won’t work as expected unless explicitly wrapped in a nested aggregation, further increasing query complexity.

Alternatives to Nested Fields

In too many cases, nested fields are being used where they are not really necessary. You should carefully consider your use-case and whether the query requirements indeed demand the use of nested objects.

If it does, maybe there are alternatives and instead of using nested fields, consider:

Flattening the data model where possible.
Denormalization by duplicating data in parent documents.

If nested fields are used solely for key-value pair look ups, consider the following scheme to flatten your schema. Index mapping that is defined like this:

PUT product
{
  "mappings": {
    "properties": {
      "title": {"type": "text"},
      "attributes": {"type": "keyword", "doc_values": false}
    }
  }
}

Allows for text attributes to be defined and used like this:

PUT products/_doc/1
{
  "title": "Customer 1",
  "attributes": ["color|blue", "size|m"]
}


PUT products/_doc/2
{
  "title": "Customer 2",
  "attributes": ["color|green", "size|s"]
}

PUT products/_search
{ 
  "query": {
    "term": {
      "attributes": "color|blue"
    }
  }
}

Admittedly, this only allows for exact lookups (no sorting or aggregations) and only text values (no numeric range searches, or dates or other specialized data types). But in many cases, this is more than enough and can result in huge performance and cost improvements.