ClickHouse vs Elasticsearch: Choosing the Right Engine

ClickHouse and Elasticsearch both handle large-scale data, but are built for fundamentally different access patterns. A technical guide to understanding the architectural differences and making the right choice.

Both ClickHouse and Elasticsearch can ingest millions of events per second, store them for months, and answer queries against the result. In the observability and log analytics space they are both routinely pitched as solutions. That overlap is superficial. The two systems are built around fundamentally different data structures, each making a different set of query patterns fast and everything else slow.

Getting this choice wrong is expensive. Organizations that run petabyte-scale log analytics on Elasticsearch fight JVM heap pressure constantly and pay an order of magnitude more in storage than necessary. Organizations that put a relevance-search workload on ClickHouse will find no BM25 scoring, no fuzzy matching, and no way to rank results by how well they match a query. This post covers what you need to know to make the right call.

Two Engines, Two Data Structures

ClickHouse is a columnar OLAP database. Its primary storage engine, MergeTree, writes data in immutable parts on disk with each column stored in a separate compressed binary file. A sparse primary index holds one entry per 8,192 rows (a "granule"), keeping the entire index small enough to fit in RAM while enabling fast range scans via granule skipping. Additional skip indexes - bloom filters, min-max indexes - can prune granules for non-primary-key filters. A query touching 3 columns out of 100 reads roughly 3% of the data on disk. The execution engine is fully vectorized: data flows through the query pipeline in column-aligned blocks, enabling SIMD instructions and CPU cache-friendly sequential reads.

Elasticsearch is built on Apache Lucene, and its core data structure is the inverted index: a precomputed mapping from every unique term to the list of documents containing it. At index time, text fields are tokenized, lowercased, and stemmed through configurable analyzer chains. The inverted index makes "find documents containing this term" a near-instant lookup at query time. But each document is physically stored three times: in the inverted index for search, in stored fields for retrieval, and in doc_values (a columnar format) for aggregations and sorting. This triplication supports all three access patterns simultaneously and is why Elasticsearch's storage overhead is so high compared to purpose-built columnar stores.

This structural divide explains nearly every downstream difference - compression ratios, aggregation performance, write amplification costs, and which queries each system handles well.

Aggregation and Log Analytics: ClickHouse's Ground

For workloads that aggregate billions of rows - counting events by service, computing latency percentiles, grouping by time window - ClickHouse has a decisive advantage. The vectorized aggregation engine processes column blocks sequentially using SIMD instructions that Elasticsearch's JVM-based aggregation pipeline cannot match. Independent benchmarks consistently show ClickHouse at 5x or lower aggregation latency. ContentSquare reported queries running 4x faster overall and 10x faster at P99 after migrating from Elasticsearch to ClickHouse, at 11x lower infrastructure cost. Uber reduced its cluster footprint by over 50% while serving more queries at higher write throughput.

Storage cost is the other primary factor. Columnar storage compresses same-type data far better than row-oriented formats. ClickHouse's codec stack layers algorithm-specific transforms on top of LZ4 or ZSTD: DoubleDelta for timestamps, Gorilla for slowly-changing floats, Delta for monotonically increasing integers. Cloudflare measured document storage dropping from 600 bytes in Elasticsearch to 60 bytes in ClickHouse - a 10x reduction - which let them store 100% of their logs at 35-45 million requests per second within the same budget. Typical log analytics workloads see 10:1 to 20:1 compression ratios in ClickHouse versus 1.5:1 in Elasticsearch.

Pre-aggregation as a First-Class Feature

ClickHouse's MergeTree engine family includes specialized variants for maintaining pre-aggregated state. An AggregatingMergeTree table backed by a materialized view fires incrementally on each new insert batch, computes partial aggregation states, and appends them to a target table. Queries hit only the aggregated rows, not the raw data. Projections extend this further: a projection defines an alternative sort order or aggregated form stored as a hidden sub-table within the same parts, and ClickHouse's query planner selects the best projection automatically without query rewrites. These mechanisms shift expensive compute from query time to insert time, keeping dashboard latency flat as raw data volume grows.

High-cardinality GROUP BY is another area where Elasticsearch hits a structural limit. The terms aggregation has a default bucket cap (10,000) and memory scales with cardinality times shard count. ClickHouse handles GROUP BY on columns with millions of distinct values - user_id, session_id, URL - without special tuning, because vectorized scans process data from disk without loading all distinct values into heap.

Full-Text Search and Vector Retrieval: Elasticsearch's Strengths

Elasticsearch was built for one thing: finding documents matching a query, ranked by relevance. BM25 is the default ranking algorithm, weighting results by term frequency, inverse document frequency, and field length normalization. Text goes through configurable analyzer chains at index time - character filters, tokenizers, token filters for stemming, synonym expansion, and stopword removal - producing an inverted index tuned for a specific language and domain. The result: match, match_phrase, fuzzy (Levenshtein distance), wildcard, and span queries that all return results ordered by relevance to the query. Tuning is available at every level: per-field boost weights, function_score for custom ranking logic, and dis_max for multi-field matching.

ClickHouse has been adding text search capabilities - token bloom filters, ngram bloom filters, and an experimental inverted index type - but has no relevance scoring. No BM25, no TF/IDF, no notion of how well a document matches a query. Results are unranked unless you add an explicit ORDER BY on a numeric field you compute yourself. This is not a gap that closes easily: relevance ranking requires knowing term distribution across the corpus at query time, which conflicts with columnar scan-based execution. For any application where result ranking matters - e-commerce search, knowledge base discovery, enterprise document retrieval - Elasticsearch is the right engine.

Vector Search and Hybrid Retrieval

Since version 8.0 (February 2022), Elasticsearch supports approximate nearest-neighbor vector search via HNSW indexes. Version 8.4 added hybrid search: combining dense vector similarity with BM25 lexical scores in a single query using Reciprocal Rank Fusion. Subsequent releases added int8 and 4-bit quantization to reduce memory footprint, and the ACORN-1 algorithm for fast filtered kNN on large, highly-filtered result sets. This makes Elasticsearch the established choice for RAG pipelines and semantic search applications that need ranked relevance across both token-level and semantic dimensions. ClickHouse has no comparable production-grade approximate nearest-neighbor capability today.

Geospatial support follows the same pattern. Elasticsearch's geo_point and geo_shape field types support radius queries, bounding box queries, and complex geometry intersections. ES|QL includes geospatial functions for intersects, within, contains, and disjoint operations, combinable with full-text search and aggregations in a single query. ClickHouse has limited geospatial primitives by comparison.

Operational Considerations

ClickHouse runs as a C++ binary with no JVM. Memory usage is predictable, and analytical queries that would OOM an Elasticsearch node can spill to disk via external aggregation. The main operational requirement is batch inserts: ClickHouse is designed for large bulk writes. Individual-row inserts create excessive parts and background merge pressure. Production deployments typically buffer writes through Kafka and insert batches of 100K-1M rows at a time. The async_insert setting handles cases where you cannot control client batch size, coalescing small inserts server-side before writing parts.

Elasticsearch's primary operational challenge is JVM heap management. The standard recommendation is to set heap to 50% of available RAM, capped around 30 GB (above 32 GB, the JVM loses compressed object pointer optimization). Stop-the-world garbage collection pauses are a persistent production risk: if a GC pause exceeds the cluster's fault detection timeout, the master can eject the node, triggering shard reallocation and potentially cascading instability. Oversharding is the other common failure mode - each shard consumes heap for segment caches and field data, and clusters with thousands of shards suffer GC pressure even at moderate query load.

Schema management differs as well. Elasticsearch's dynamic mapping accepts arbitrary JSON and auto-detects field types on first ingestion. Flexible for evolving data shapes, but prone to mapping explosions when ingesting semi-structured data with thousands of unpredictable field names (Kubernetes labels, trace attributes) - a problem Cloudflare specifically cited as a driver for their migration. ClickHouse requires a defined schema upfront, though ALTER TABLE ADD COLUMN is fast and non-blocking. One last consideration: ClickHouse is Apache 2.0 licensed. Elasticsearch uses the Elastic License v2 and SSPL, which restrict running it as a managed service. OpenSearch is the Apache 2.0 fork for teams that need a search engine without those licensing restrictions.

Choosing Between Them

The access pattern is the deciding factor. If your queries are primarily aggregations, time-series analysis, and structured log analytics at scale, ClickHouse will cut your storage costs by 10-20x and your query latency by 5x or more compared to Elasticsearch. If your queries require ranked relevance, fuzzy matching, geospatial filtering, or hybrid vector + keyword retrieval, Elasticsearch is the right choice - none of those capabilities exist in ClickHouse today.

Running both is architecturally sound when you genuinely need both workloads. A Kafka fan-out writes to both systems; analytics queries hit ClickHouse, text search queries hit Elasticsearch. Quesma is a query translation middleware that accepts the Elasticsearch API - including Kibana queries - and routes them to ClickHouse, enabling gradual migration without rewriting clients. The dual-stack adds operational surface area, schema coordination overhead, and ingestion lag to keep in sync. Before committing to it, verify that you genuinely use ranked search in production and aren't just using Elasticsearch because it was there when the log pipeline was first built.

The clearest signal for migration is cost and query pattern. If you're running log analytics on Elasticsearch with no relevance-ranked search queries, the storage savings alone typically justify the migration effort. If you're running a user-facing search product with fuzzy matching and result ranking, stay on Elasticsearch. If you need both, plan the dual-stack carefully - and keep the two concerns separated from the start so each system can be tuned independently.