Speed up slow ClickHouse queries with Projections. Learn how this powerful feature optimizes queries on non-primary key columns and pre-aggregates data automatically.

Imagine your MergeTree table is perfectly ordered by event_date for time-series analysis. But now, your team needs to run frequent queries filtering by user_id. Since user_id isn't in the primary key, ClickHouse is forced into a full table scan every time, reading millions or billions of rows just to find a few. These queries become slow and expensive. This is the exact problem Projections are designed to solve.

What are ClickHouse Projections?

At its core, a Projection is a hidden, query-optimized version of your table's data stored alongside the main table data parts. Think of it as an automatically managed secondary index. While a MergeTree table can only be physically sorted by its primary key, Projections allow you to create additional, optimized data layouts sorted by different columns or even pre-aggregated data.

The best part? Projections are transparent. You continue to query your original table, and ClickHouse automatically decides if using a Projection will be more efficient. If it is, ClickHouse uses the Projection behind the scenes without you needing to change a single line of your query.

Benefits of Using Projections

  • Accelerate Queries: By creating a version of the data sorted by frequently filtered columns, Projections allow ClickHouse to skip scanning massive amounts of data, leading to significant speed improvements.
  • Pre-aggregate Data: You can define a Projection to store pre-computed aggregations like sum(), avg(), or count(). This is extremely effective for speeding up dashboard queries and recurring analytical reports.
  • Query Transparency: Unlike a Materialized View, which requires you to query a separate target table, Projections are used automatically by the query optimizer.

How to Use Projections

Getting started with Projections involves two main steps: adding the Projection definition and then building it for any existing data.

Step 1: Add the Projection

You can add a Projection to an existing MergeTree table using the ALTER TABLE ... ADD PROJECTION command. The definition is a SELECT query that specifies the columns and the desired order or aggregation.

Example: Creating a Projection for faster filtering on a town column

ALTER TABLE uk_price_paid
ADD PROJECTION town_sort_projection (
    SELECT
        *
    ORDER BY town
);

In this example, the Projection stores a copy of the table's data sorted by the town column, making WHERE clauses on that column much faster.

Step 2: Materialize the Projection for Existing Data

When you add a Projection to a table that already contains data, it will only be created for new data inserted after its creation. To build the Projection for all existing data, you must run the MATERIALIZE PROJECTION command.

ALTER TABLE uk_price_paid
MATERIALIZE PROJECTION town_sort_projection;

This is an asynchronous background process. You can monitor its progress by checking the system.mutations table.

Common Gotchas and Mistakes

While powerful, Projections come with trade-offs. Keep these points in mind to avoid common pitfalls.

  • Forgetting to Materialize: A frequent mistake is adding a Projection and wondering why queries on old data aren't faster. You must materialize the Projection for existing data parts to see a performance benefit on them.
  • Increased Storage and Slower Inserts: Projections create a copy of data, which consumes additional disk space and requires more I/O. They also add overhead to INSERT operations because the projection data must also be created. Choose Projections wisely for queries that have a significant negative impact on the system.
  • Incompatible Queries: A Projection might not be used for several reasons:
    • Queries using FINAL are not compatible with projections.
    • By default, projections are not used on tables with DELETED rows (lightweight deletes). Newer ClickHouse versions have a lightweight_mutation_projection_mode setting to change this behavior.
    • The query must use ONLY the columns defined in the Projection.
  • Verification is Key: To confirm if your query is using a Projection, prefix your query with EXPLAIN. The query plan output will list the projection's name if it was used. You can also inspect a projection's disk usage by querying the system.projection_parts table.