Memory Pressure and Query Optimization in ClickHouse

Identify memory-hungry queries in ClickHouse using normalized query hashes and system.query_log analysis. Learn to spot common culprits like groupArray, uniqExact, and inefficient joins.

When ClickHouse queries start getting killed unexpectedly, memory pressure is almost always the reason. ClickHouse enforces memory limits at the server level, and when those limits are breached, a circuit breaker kicks in and terminates the most expensive running queries. This is actually a good thing - it protects cluster stability. But it also means you have queries that need optimization.

The challenge is not just finding out that memory is under pressure. It is identifying which query patterns are responsible, understanding why they consume so much memory, and knowing what to do about them. The answer to all three lives in system.query_log.

How ClickHouse Enforces Memory Limits

In recent ClickHouse versions, memory limits are enforced at the server level. When the total memory usage approaches the configured ceiling, ClickHouse identifies the most expensive currently running query and kills it. This acts as a circuit breaker - it sacrifices one query to keep the system as a whole responsive.

The killed query receives an error, and that error is logged in system.query_log. But not all memory issues manifest as killed queries. You can also see degraded performance, increased query latency, and general system instability when memory is consistently under pressure.

It is also worth noting that memory pressure is not always caused by queries. Background merges can consume a significant chunk of memory, especially on smaller installations with limited RAM. If you are running a low-memory environment, you need to pay attention to how much background work is happening and consider limiting merge concurrency.

Finding Memory-Hungry Queries

The most effective way to find problematic query patterns is to query system.query_log grouped by normalized_query_hash. A normalized query hash is a hash of your query with all literal values stripped out - so SELECT * FROM events WHERE user_id = 123 and SELECT * FROM events WHERE user_id = 456 produce the same hash. This lets you analyze query patterns rather than individual executions.

SELECT
      normalized_query_hash,
      count() AS query_count,
      formatReadableSize(avg(memory_usage)) AS avg_memory,
      formatReadableSize(max(memory_usage)) AS peak_memory,
      quantile(0.95)(memory_usage) AS p95_memory,
      any(query) AS sample_query
  FROM system.query_log
  WHERE event_date >= today() - 7
      AND type = 'QueryFinish'
  GROUP BY normalized_query_hash
  ORDER BY max(memory_usage) DESC
  LIMIT 20

This query surfaces the top 20 query patterns by peak memory consumption over the past week, along with how often they run and a sample query for investigation.

What to Look For

The most important signal is volatility - query patterns where the peak memory and average memory differ significantly. These are queries that behave fine most of the time but spike under certain conditions, typically when they hit a large tenant, a wide time range, or an unusually high-cardinality data segment. These volatile patterns are the ones most likely to trigger the circuit breaker unpredictably.

In practice, it is often just one or two query patterns consuming most of the memory. Identifying those and optimizing them can dramatically improve system stability.

Common Memory Culprits and Fixes

Certain SQL patterns are known to be memory-intensive in ClickHouse. If your top memory consumers include any of these, you have a clear optimization target.

groupArray over large result sets. This function accumulates all matching values into an array in memory. Running it over a billion rows means holding a billion-element array in memory. If you need this behavior, consider limiting the output with groupArray(N)(column) to cap the array size, or restructure the query to reduce the input set.

uniqExact on high-cardinality columns. Counting exact distinct values requires storing every unique value in memory. On a column with hundreds of millions of unique values, this becomes very expensive. The approximate alternative uniq uses an adaptive sampling algorithm and is dramatically cheaper in memory while providing accuracy within a few percent - sufficient for most analytical use cases. For explicit HyperLogLog-based counting, ClickHouse offers uniqHLL12 and uniqCombined, though uniq is generally the recommended default.

Inefficient joins. Joins that materialize large intermediate result sets or that use a suboptimal join algorithm can consume enormous amounts of memory. A critical but often overlooked factor is join ordering: ClickHouse builds a hash table in memory from the right-hand side of the join, then streams the left-hand side to probe it. This means placing a large table on the right side forces the entire large side into memory. Simply swapping the join order so the smaller table is on the right can reduce memory usage by orders of magnitude. ClickHouse does not automatically reorder tables, so this is always the developer's responsibility. Beyond ordering, ClickHouse offers multiple join algorithms, including ones that spill to disk when memory is tight. If you must run memory-intensive joins, consider using the partial_merge or auto join algorithm, or restructure the query to reduce the join's input size.

For all of these, the goal is to avoid simply raising memory limits indefinitely. Increasing limits without fixing the underlying query patterns just delays the problem and risks wider system instability.

Mining the Query Log

Beyond memory-specific analysis, system.query_log is the single most valuable table in the system catalog for optimization work. Every query is logged with duration, memory usage, rows read, rows written, error status, and profile events. Here is a general-purpose workload analysis query:

SELECT
      normalized_query_hash,
      count() AS calls,
      round(sum(query_duration_ms) / 1000, 2) AS total_seconds,
      quantile(0.5)(query_duration_ms) AS p50_ms,
      quantile(0.95)(query_duration_ms) AS p95_ms,
      quantile(0.99)(query_duration_ms) AS p99_ms,
      formatReadableSize(avg(memory_usage)) AS avg_memory,
      any(query) AS sample_query
  FROM system.query_log
  WHERE event_date >= today() - 1
      AND type = 'QueryFinish'
  GROUP BY normalized_query_hash
  HAVING calls >= 10
  ORDER BY total_seconds DESC
  LIMIT 20

This shows you the query patterns that consume the most total execution time. This is a critical distinction: a query that runs for 300ms but gets called thousands of times per hour may consume far more total cluster time than a single query that takes 10 seconds. When choosing your top 10 queries to optimize, total time spent often matters more than single-query latency.

Key Dimensions to Track

Query duration percentiles. Always use percentiles (P50, P95, P99) rather than averages. Averages lie - especially in multi-tenant systems where most queries are fast but a handful on large tenants are extremely slow. Percentiles reveal the true distribution.

Parallel query pressure. Are queries being queued? Are you over-parallelizing and causing excessive thread context switches? The query log can help you understand concurrency patterns.

Rows and bytes read. A query that reads far more rows than it returns likely has an inefficient filter that is not aligned with the table's ordering key. The ProfileEvents column in the query log contains details like how many marks (data granules) were selected, which directly tells you how much data ClickHouse had to scan. This helps you identify candidates for ordering key adjustments, skipping indices, or projections.

Query errors. Errors are not always obvious in aggregate metrics. Querying for failed queries by pattern helps you find issues that are silently affecting users.

Key Takeaways

ClickHouse enforces memory limits at the server level and will kill expensive queries to protect stability. This is a feature, but it also signals queries that need optimization.
Use normalized_query_hash to group and analyze query patterns. Look for high volatility between average and peak memory usage.
The usual suspects for memory pressure are groupArray, uniqExact, and inefficient joins. For joins, always place the smaller table on the right side. Fix the query pattern rather than raising limits.
system.query_log tells you about 80% of what you need to know. Prioritize optimization by total time spent, not just single-query latency, and always use percentiles instead of averages.