Elasticsearch aggregations let you compute metrics, build buckets, and chain pipeline calculations across your data. This guide covers all aggregation types, syntax, performance tips, and what changed in ES 8.x and 9.x.

Elasticsearch aggregations are the engine behind analytics in Elasticsearch. Every pie chart in Kibana, every metrics dashboard, every statistical summary runs on aggregations under the hood. An Elasticsearch aggregation is a computation applied to a set of documents within an index - the equivalent of SQL's GROUP BY, COUNT, AVG, and SUM, but with the ability to nest and chain operations in ways that SQL cannot express natively.

This guide covers aggregation types, syntax, practical examples, performance tuning, and what changed in Elasticsearch 8.x and 9.x.

What is an Elasticsearch Aggregation?

An aggregation in Elasticsearch is a function applied to a document set that computes metrics, groups documents into buckets, or derives values from other aggregation results. Aggregations answer questions like:

  • What are the top five selling products on our website?
  • How many tweets were sent in the past hour?
  • What is the average sale price per product category?

Aggregations are used heavily in Kibana visualizations. When you create a pie chart based on a field value, Elasticsearch runs a bucket aggregation to split documents into groups, and Kibana renders the result. But aggregations are just as powerful when called directly via the Query DSL from application code.

Aggregation Syntax

Aggregations in Elasticsearch work alongside the Query DSL. You can first specify a query to limit the document set, or omit the query entirely (which acts like a match_all). The basic structure:

{
    "query": { ... },
    "aggs": {
      "my_aggregation_name": {
        "aggregation_type": {
          "field": "field_name"
        }
      }
    }
  }
  
  • aggs (or aggregations): signals the start of aggregation definitions
  • my_aggregation_name: any valid string you choose as a label
  • aggregation_type: the specific aggregation (e.g., terms, avg, date_histogram)
  • field: the document field to aggregate on

Aggregations can be nested - you can place sub-aggregations inside bucket aggregations to compute metrics per bucket. This is one of the most powerful features and is shown in the examples below.

Types of Elasticsearch Aggregations

Elasticsearch aggregations fall into four categories:

Category Purpose Examples
Bucket Group documents into buckets by criteria terms, date_histogram, range, filters, composite, multi_terms
Metric Calculate statistics from field values avg, max, min, sum, cardinality, percentiles, stats
Pipeline Compute values from other aggregation outputs bucket_script, moving_avg, cumulative_sum, derivative
Matrix Compute statistics across multiple fields simultaneously matrix_stats

Bucket Aggregations

Bucket aggregations group documents based on field values, ranges, or conditions - similar to SQL's GROUP BY. Some commonly used bucket aggregations:

  • terms: Splits documents into one bucket per unique field value. For a vehicle_make field, you get buckets for "Ford", "Tesla", "Kia", etc. Works best on keyword fields with bounded cardinality.
  • date_histogram and date_range: Organize documents by time intervals. Use date_histogram with a calendar_interval or fixed_interval for regular time series, and date_range for custom date boundaries.
  • geo_distance: Groups documents by proximity to a geo point - useful for location-based analytics.
  • composite: Paginates through all bucket combinations for high-cardinality fields. This is the recommended approach for aggregations that return many buckets, as it uses an after_key cursor instead of loading all buckets into memory.
  • multi_terms: Groups documents by combinations of multiple fields in a single aggregation, avoiding the need for nested terms aggregations in some cases.

There are 30+ bucket aggregation types. While bucket aggregations are powerful on their own, they are most useful when combined with metric sub-aggregations.

Metric Aggregations

Metric aggregations compute statistics from field values. To calculate a metric, specify the aggregation type and the target field. For example, finding the maximum purchase price:

{
    "aggs": {
      "price_max": {
        "max": {
          "field": "purchase_price"
        }
      }
    }
  }
  

The stats aggregation computes min, max, sum, count, and avg in a single request - a real time saver when you need multiple statistics.

One metric aggregation worth highlighting is top_hits. Used as a sub-aggregation, it returns the actual matching documents within each bucket. This is useful for "group by" style searches. For example, searching a vehicles index and grouping results by make:

POST vehicles/_search?size=0
  {
    "query": {
      "multi_match": {
        "query": "new 4 door sedan",
        "fields": ["make", "model", "type", "description"]
      }
    },
    "aggs": {
      "make": {
        "terms": {
          "field": "make",
          "size": 10
        },
        "aggs": {
          "hits": {
            "top_hits": {
              "size": 10
            }
          }
        }
      }
    }
  }
  

Note the size=0 parameter. When you only care about aggregation results and not search hits, setting size=0 reduces payload size and allows more efficient query cache utilization. This applies to most aggregation queries. See also our post on using the Elasticsearch query cache effectively.

Pipeline Aggregations

Pipeline aggregations compute values from the output of other aggregations rather than from documents directly. They use a buckets_path parameter to reference values in the aggregation hierarchy:

AGG_SEPARATOR       =  `>` ;
  METRIC_SEPARATOR    =  `.` ;
  PATH                =  <AGG_NAME> (<AGG_SEPARATOR>, <AGG_NAME>)* (<METRIC_SEPARATOR>, <METRIC>) ;
  

Here is an example using bucket_script to calculate the percentage of total sales per vehicle make that each model contributes:

POST vehicles/_search?size=0
  {
    "query": {
      "bool": {
        "filter": [{ "term": { "isSold": true } }]
      }
    },
    "aggs": {
      "make": {
        "terms": { "field": "make", "size": 10 },
        "aggs": {
          "total_sales": {
            "sum": { "field": "salePrice" }
          },
          "model_sales": {
            "terms": { "field": "model", "size": 10 },
            "aggs": {
              "sales": {
                "sum": { "field": "salePrice" }
              }
            }
          },
          "sales_percentage": {
            "bucket_script": {
              "buckets_path": {
                "modelSales": "model_sales>sales",
                "totalSales": "total_sales"
              },
              "script": "params.modelSales / params.totalSales * 100"
            }
          }
        }
      }
    }
  }
  

Pipeline aggregations unlock complex analytical insights by chaining aggregation results together - derivatives, moving averages, cumulative sums, and custom calculations.

Aggregations Added in Elasticsearch 8.x and 9.x

Elasticsearch 8.x and 9.x introduced several aggregation improvements:

  • random_sampler aggregation (8.2+): Samples a random subset of documents before running sub-aggregations. This dramatically reduces computation time for approximate analytics on large indices - useful when exact counts are not required and speed matters more.
  • Cardinality and percentile performance gains: The HyperLogLog++ implementation behind cardinality and the t-digest behind percentiles saw optimizations in 8.x, reducing memory usage and improving accuracy at high cardinalities.
  • Query parallelization improvements: Elasticsearch 8.x improved how aggregations are parallelized across shards and segments, benefiting large aggregation queries on multi-shard indices.
  • Elasticsearch 9.0 and Lucene 10: The upgrade to Lucene 10 brings further improvements to doc values access patterns, which directly benefit aggregation performance since most aggregations read doc values.

Aggregation Performance: Tips for Production

Aggregations can be expensive. Some practical tips for keeping them fast:

  1. Use size: 0 when you don't need hits. This avoids fetching and scoring documents you won't use, and enables the query cache.

  2. Filter first, aggregate second. Use a bool filter to narrow the document set before aggregating. Aggregating over 1 million filtered documents is far cheaper than aggregating over 100 million.

  3. Watch sub-aggregation depth. Each level of nesting multiplies the work. A terms aggregation with 100 buckets, each containing another terms aggregation with 100 buckets, produces 10,000 bucket combinations. Keep nesting shallow or use composite for pagination.

  4. Use composite for high-cardinality pagination. Instead of a terms aggregation with a huge size, use composite to page through results. It uses constant memory regardless of total bucket count.

  5. Set eager_global_ordinals on frequently aggregated keyword fields. This pre-builds the ordinals data structure at refresh time rather than at query time, trading slightly slower indexing for faster aggregations.

  6. Use random_sampler for approximate analytics. If you need rough counts or averages over billions of documents, sampling 1-10% can cut aggregation time by 10-100x with acceptable accuracy.

  7. Use time-based indices for time-series data. When aggregating over time ranges, data streams with time-based backing indices allow Elasticsearch to skip entire indices outside the requested range.

For more on identifying and fixing expensive aggregation queries, see our post on expensive queries in Elasticsearch and OpenSearch and query limits.

Key Takeaways

  • Elasticsearch aggregations fall into four categories: bucket (grouping), metric (statistics), pipeline (chaining), and matrix (multi-field).
  • Use size: 0 on aggregation-only queries for better performance and cache utilization.
  • The composite aggregation is the correct approach for paginating through high-cardinality bucket results.
  • Elasticsearch 8.x added random_sampler for fast approximate analytics and improved aggregation parallelization.
  • Watch sub-aggregation depth and cardinality - these are the most common causes of slow aggregation queries.
  • Pulse can help you identify and monitor expensive aggregation queries in production. If you need hands-on help, our Elasticsearch consulting team has deep experience optimizing aggregation-heavy workloads.