ClickHouse Materialized Views: How They Work and Where They Break

ClickHouse materialized views are important but widely misunderstood. This engineering guide covers the block-level trigger model, correct aggregation patterns, write amplification trade-offs, and when they cause more problems than they solve.

Materialized views in ClickHouse look deceptively simple. You write a SELECT, attach it to a table, and data flows into a pre-aggregated target automatically. In practice, several behaviors diverge sharply from what engineers coming from traditional databases expect - and those surprises tend to surface under production load.

This post covers how ClickHouse MVs actually work under the hood, what you need to get right when designing them, the aggregation traps that cause silent data corruption, and the operational costs that accumulate at scale.

The Insert Trigger Model

A ClickHouse materialized view is not a snapshot you refresh on a schedule. It is an AFTER INSERT trigger attached to the leftmost table in its SELECT statement. It fires on every INSERT, reads nothing from source table disk, and processes only the data block that just arrived in memory. When an insert lands, ClickHouse runs the MV's SELECT against the in-memory buffer and writes the result to the target table as a new part.

This has a non-obvious implication: ClickHouse splits large INSERTs into blocks based on max_insert_block_size (default 1,048,576 rows). A 3-million-row INSERT arrives as three sequential blocks, and the MV fires three times, writing three separate parts to the target. This processing is synchronous - a slow MV SELECT directly increases insert latency on the source table, because ClickHouse won't acknowledge the INSERT as complete until all attached MVs have written their output.

Multiple MVs on the same source table execute sequentially in alphabetical order by default. The parallel_view_processing setting can parallelize execution at the cost of higher CPU usage. Each MV adds at least one additional write per INSERT block, which matters for high-throughput tables.

Designing the Target Table

Always use the TO <target_table> syntax. When you omit it, ClickHouse creates a hidden .inner.<uuid> table that is destroyed when you drop the view - you lose all materialized data along with the view definition. The implicit inner table also complicates schema migrations and can't be shared across multiple MVs. With the explicit TO syntax, the target table persists independently, can be managed with standard ALTER TABLE operations, and multiple views can write to the same target.

-- Create the target table explicitly
  CREATE TABLE daily_sales (
      day Date,
      product_id UInt32,
      total_revenue Float64
  ) ENGINE = SummingMergeTree()
  ORDER BY (day, product_id);
  
  -- MV using TO syntax
  CREATE MATERIALIZED VIEW daily_sales_mv
  TO daily_sales
  AS SELECT
      toDate(created_at) AS day,
      product_id,
      sum(price) AS total_revenue
  FROM sales
  GROUP BY day, product_id;

The choice of engine for the target table determines how partial results accumulate. Since the MV processes one INSERT block at a time, the target holds multiple partial aggregations for the same GROUP BY key until background merges combine them.

SummingMergeTree handles this for additive metrics: during background merges, ClickHouse sums numeric columns for rows sharing the same ORDER BY key. At query time you still need GROUP BY with sum() or the FINAL keyword, because un-merged parts may coexist. It only handles sums correctly - averages, distinct counts, and percentiles require a different engine.

AggregatingMergeTree stores intermediate aggregation state as binary blobs and merges them using corresponding *Merge functions. This is correct for non-additive aggregates, but requires using -State and -Merge function pairs throughout - in the MV SELECT and in every query against the target:

CREATE TABLE monthly_stats (
      month Date,
      user_id UInt64,
      unique_products AggregateFunction(uniq, UInt32),
      avg_order_value AggregateFunction(avg, Float64)
  ) ENGINE = AggregatingMergeTree()
  ORDER BY (month, user_id);
  
  CREATE MATERIALIZED VIEW monthly_stats_mv
  TO monthly_stats
  AS SELECT
      toStartOfMonth(created_at) AS month,
      user_id,
      uniqState(product_id) AS unique_products,
      avgState(order_value) AS avg_order_value
  FROM orders
  GROUP BY month, user_id;
  
  -- Query:
  SELECT month, user_id, uniqMerge(unique_products), avgMerge(avg_order_value)
  FROM monthly_stats GROUP BY month, user_id;

One non-negotiable alignment rule: the ORDER BY columns of the target table must match the GROUP BY columns in the MV SELECT. If they diverge, background merges won't correctly combine partial results and you'll get duplicate rows in query output.

The Aggregation Gotchas You Need to Know

Block-level processing is the root cause of a large class of correctness bugs in production MVs. Consider a view that computes uniqExact(user_id) and stores the result as a plain integer in the target. The first block of 100k events yields 8,432 unique users. The second block yields 7,891. The target now has two rows for the same day. Summing them gives 16,323 - which overcounts users appearing in both blocks. Background merges of a plain MergeTree or SummingMergeTree will just add the counts together.

The same problem applies to every non-additive aggregate: all uniq variants, all quantile functions, and avg (the average of two averages is not the global average). Using these functions in an MV without the State/Merge pattern gives silently wrong results, and the errors grow proportionally with insert volume.

The fix is to always use AggregatingMergeTree with the State/Merge pattern for anything beyond simple sums and counts. The uniqState function stores a HyperLogLog sketch; when ClickHouse merges two AggregatingMergeTree parts, it merges the sketches and correctly computes cardinality over the union. avgState carries both sum and count, so combined states produce the correct weighted average.

JOINs in MV selects carry a related trap: the trigger only fires on INSERTs to the leftmost table in the JOIN. If a lookup or dimension table on the right side changes, the MV does not re-fire. The derived data reflects the dimension state at insert time. For bidirectional sensitivity you need two separate MVs, each with a different table on the left.

Operational Costs: Write Amplification, Schema Changes, Backfilling

Write amplification compounds with scale. Altinity's benchmarks show a single MV causes roughly 55% insert throughput reduction with 100-row batches; five chained MVs reach ~80%; ten approach ~90%. The degradation is less severe with larger batches - a single MV on 1-million-row batches costs ~38% - because fixed overhead is amortized across more rows. Each INSERT creates 1 + N new parts (one for the source table, one per MV target). These small parts must be merged by background threads. At high INSERT rates with many MVs, part creation outpaces merge capacity and ClickHouse begins throttling inserts - typically at ~300 active parts per partition ("Too many parts" error).

Schema migrations are operationally painful. Adding or dropping a column in the source table requires dropping and recreating the MV, which creates a data gap during the migration window. Any rows inserted while the MV is absent are not captured - you have to backfill them manually afterward. Attempting to drop a column referenced by an MV will fail with CANNOT_DROP_COLUMN. The migration sequence is always: drop the MV, alter the source and target tables, recreate the MV, then backfill missing data.

Backfilling correctly

For backfilling, avoid the POPULATE keyword. It's not available with the TO syntax at all, and even on implicit inner tables it creates a data loss window: rows inserted during the populate operation are invisible to it, and the MV trigger isn't active yet. The correct approach: create the MV without POPULATE so the trigger is immediately live for new data, then backfill historical data in partition-sized chunks:

-- Backfill one month at a time
  INSERT INTO daily_sales
  SELECT
      toDate(created_at) AS day,
      product_id,
      sum(price) AS total_revenue
  FROM sales
  WHERE toYYYYMM(created_at) = 202401
  GROUP BY day, product_id;

For large tables, staging the backfill through a separate table and attaching partitions atomically avoids massive in-memory GROUP BY operations. If a chunk fails, you drop that partition from the staging table and retry only that chunk.

When to Skip Them

MVs are the wrong tool in several common situations.

When source data is updated or deleted - even with ReplacingMergeTree or CollapsingMergeTree - MVs don't react. They trigger on INSERT only. The MV target retains stale data with no built-in mechanism to propagate source mutations.

When the aggregation expands data rather than compressing it. If your GROUP BY has high cardinality relative to source rows, the MV target can grow larger than the source. A real production case had a raw log table at 20 GB and an MV target at 190 GB - a 10x expansion - because GROUP BY (user_id, attribute_name) across ten attributes per source row produced ten output rows. Querying the raw table directly would have been cheaper.

When insert latency is a strict SLA. MV processing is synchronous with each INSERT block. A moderately complex MV SELECT can take a raw 10ms insert to 100ms+. There's no way to defer the cost without architectural changes.

For rolling time-window aggregates (last 7 days), a query on properly partitioned raw data with a pruning WHERE clause frequently completes in milliseconds - no MV needed.

Alternatives worth knowing

ClickHouse Projections pre-sort or pre-aggregate data with a different primary key and are maintained automatically. They have simpler semantics than MVs for the read-optimization use case - no separate target table, no manual backfill, no trigger model to reason about.

Refreshable Materialized Views (added in ClickHouse 23.12) run a full SELECT over the source data on a schedule and atomically swap the result into the target. They support complex queries - including joins between frequently-updated tables - that incremental MVs can't handle correctly. The trade-off is real-time freshness: the data reflects the last scheduled refresh, not the latest insert.

Pre-aggregation in the ingestion pipeline (Flink, Kafka Streams, custom consumers) moves the transformation logic upstream. It removes MV complexity from the database layer entirely and handles stateful aggregations with proper late-data semantics.

The sweet spot for ClickHouse MVs is high-throughput append-only pipelines where you pre-aggregate into a SummingMergeTree or AggregatingMergeTree target, the aggregation meaningfully compresses data, and you fully understand block-level processing semantics. Get those conditions right and MVs deliver query acceleration with no changes to your read path.

Key points:

Always use TO <target_table> syntax - the implicit inner table is dropped with the view.
Match ORDER BY in the target to GROUP BY in the MV SELECT, or merges produce duplicates.
Use the State/Merge pattern for any non-additive aggregate (uniq, avg, quantile).
Backfill manually in partition-sized chunks; avoid POPULATE on large tables.
Write amplification is real: each MV adds I/O pressure proportional to insert volume, and synchronous processing means MV cost directly adds to insert latency.
Schema migrations require dropping the MV and create a data gap - plan and script them before execution.