Projections
Projections store data in a format that optimizes query execution, this feature is useful for:
- Running queries on a column that is not a part of the primary key
- Pre-aggregating columns, it will reduce both computation and IO
You can define one or more projections for a table, and during the query analysis the projection with the least data to scan will be selected by ClickHouse without modifying the query provided by the user.
Projections will create internally a new hidden table, this means that more IO and space on disk will be required. Example, If the projection has defined a different primary key, all the data from the original table will be duplicated.
You can see more technical details about how projections work internally on this page.
Example filtering without using primary keys
Creating the table:
CREATE TABLE visits_order
(
`user_id` UInt64,
`user_name` String,
`pages_visited` Nullable(Float64),
`user_agent` String
)
ENGINE = MergeTree()
PRIMARY KEY user_agent
Using ALTER TABLE
, we could add the Projection to an existing table:
ALTER TABLE visits_order ADD PROJECTION user_name_projection (
SELECT
*
ORDER BY user_name
)
ALTER TABLE visits_order MATERIALIZE PROJECTION user_name_projection
Inserting the data:
INSERT INTO visits_order SELECT
number,
'test',
1.5 * (number / 2),
'Android'
FROM numbers(1, 100);
The Projection will allow us to filter by user_name
fast even if in the original Table user_name
was not defined as a PRIMARY_KEY
.
At query time ClickHouse determined that less data will be processed if the projection is used, as the data is ordered by user_name
.
SELECT
*
FROM visits_order
WHERE user_name='test'
LIMIT 2
To verify that a query is using the projection, we could review the system.query_log
table. On the projections
field we have the name of the projection used or empty if none has been used:
SELECT query, projections FROM system.query_log WHERE query_id='<query_id>'
Example pre-aggregation query
Creating the table with the Projection:
CREATE TABLE visits
(
`user_id` UInt64,
`user_name` String,
`pages_visited` Nullable(Float64),
`user_agent` String,
PROJECTION projection_visits_by_user
(
SELECT
user_agent,
sum(pages_visited)
GROUP BY user_id, user_agent
)
)
ENGINE = MergeTree()
ORDER BY user_agent
Inserting the data:
INSERT INTO visits SELECT
number,
'test',
1.5 * (number / 2),
'Android'
FROM numbers(1, 100);
INSERT INTO visits SELECT
number,
'test',
1. * (number / 2),
'IOS'
FROM numbers(100, 500);
We will execute a first query using GROUP BY
using the field user_agent
, this query will not use the projection defined as the pre-aggregation does not match.
SELECT
user_agent,
count(DISTINCT user_id)
FROM visits
GROUP BY user_agent
To use the projection we could execute queries that select part of, or all of the pre-aggregation and GROUP BY
fields.
SELECT
user_agent
FROM visits
WHERE user_id > 50 AND user_id < 150
GROUP BY user_agent
SELECT
user_agent,
sum(pages_visited)
FROM visits
GROUP BY user_agent
As mentioned before, we could review the system.query_log
table. On the projections
field we have the name of the projection used or empty if none has been used:
SELECT query, projections FROM system.query_log WHERE query_id='<query_id>'
Manipulating Projections
The following operations with projections are available:
ADD PROJECTION
ALTER TABLE [db.]name [ON CLUSTER cluster] ADD PROJECTION [IF NOT EXISTS] name ( SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY] )
- Adds projection description to tables metadata.
DROP PROJECTION
ALTER TABLE [db.]name [ON CLUSTER cluster] DROP PROJECTION [IF EXISTS] name
- Removes projection description from tables metadata and deletes projection files from disk. Implemented as a mutation.
MATERIALIZE PROJECTION
ALTER TABLE [db.]table [ON CLUSTER cluster] MATERIALIZE PROJECTION [IF EXISTS] name [IN PARTITION partition_name]
- The query rebuilds the projection name
in the partition partition_name
. Implemented as a mutation.
CLEAR PROJECTION
ALTER TABLE [db.]table [ON CLUSTER cluster] CLEAR PROJECTION [IF EXISTS] name [IN PARTITION partition_name]
- Deletes projection files from disk without removing description. Implemented as a mutation.
The commands ADD
, DROP
and CLEAR
are lightweight in a sense that they only change metadata or remove files.
Also, they are replicated, syncing projections metadata via ClickHouse Keeper or ZooKeeper.
Projection manipulation is supported only for tables with *MergeTree
engine (including replicated variants).