v24.6 Changelog for Cloud

Relevant changes for ClickHouse Cloud services based on the v24.6 release.

Backward Incompatible Change

Rework parallel processing in Ordered mode of storage S3Queue. This PR is backward incompatible for Ordered mode if you used settings s3queue_processing_threads_num or s3queue_total_shards_num. Setting s3queue_total_shards_num is deleted, previously it was allowed to use only under s3queue_allow_experimental_sharded_mode, which is now deprecated. A new setting is added - s3queue_buckets. #64349 (Kseniia Sumarokova).
New functions snowflakeIDToDateTime, snowflakeIDToDateTime64, dateTimeToSnowflakeID, and dateTime64ToSnowflakeID were added. Unlike the existing functions snowflakeToDateTime, snowflakeToDateTime64, dateTimeToSnowflake, and dateTime64ToSnowflake, the new functions are compatible with function generateSnowflakeID, i.e. they accept the snowflake IDs generated by generateSnowflakeID and produce snowflake IDs of the same type as generateSnowflakeID (i.e. UInt64). Furthermore, the new functions default to the UNIX epoch (aka. 1970-01-01), just like generateSnowflakeID. If necessary, a different epoch, e.g. Twitter's/X's epoch 2010-11-04 aka. 1288834974657 msec since UNIX epoch, can be passed. The old conversion functions are deprecated and will be removed after a transition period: to use them regardless, enable setting allow_deprecated_snowflake_conversion_functions. #64948 (Robert Schulze).

New Feature

Support empty tuples. #55061 (Amos Bird).
Add Hilbert Curve encode and decode functions. #60156 (Artem Mustafin).
Add support for index analysis over hilbertEncode. #64662 (Artem Mustafin).
Added support for reading LINESTRING geometry in the WKT format using function readWKTLineString. #62519 (Nikita Mikhaylov).
Added new SQL functions generateSnowflakeID for generating Twitter-style Snowflake IDs. #63577 (Danila Puzov).
Add support for comparing IPv4 and IPv6 types using the = operator. #64292 (Francisco J. Jurado Moreno).
Support decimal arguments in binary math functions (pow, atan2, max2, min2, hypot). #64582 (Mikhail Gorshkov).
Added SQL functions parseReadableSize (along with OrNull and OrZero variants). #64742 (Francisco J. Jurado Moreno).
Add _time virtual column to file alike storages (s3/file/hdfs/url/azureBlobStorage). #64947 (Ilya Golshtein).
Introduced new functions base64URLEncode, base64URLDecode and tryBase64URLDecode. #64991 (Mikhail Gorshkov).
Add new function editDistanceUTF8, which calculates the edit distance between two UTF8 strings. #65269 (LiuNeng).
Add http_response_headers configuration to support custom response headers in custom HTTP handlers. #63562 (Grigorii).
Added a new table function loop to support returning query results in an infinite loop. #63452 (Sariel). This is useful for testing.
Introduced two additional columns in the system.query_log: used_privileges and missing_privileges. used_privileges is populated with the privileges that were checked during query execution, and missing_privileges contains required privileges that are missing. #64597 (Alexey Katsman).
Added a setting output_format_pretty_display_footer_column_names which when enabled displays column names at the end of the table for long tables (50 rows by default), with the threshold value for minimum number of rows controlled by output_format_pretty_display_footer_column_names_min_rows. #65144 (Shaun Struwig).

Performance Improvement

Fix performance regression in cross join introduced in #60459 (24.5). #65243 (Nikita Taranov).
Improve io_uring resubmits visibility. Rename profile event IOUringSQEsResubmits -> IOUringSQEsResubmitsAsync and add a new one IOUringSQEsResubmitsSync. #63699 (Tomer Shafir).
Introduce assertions to verify all functions are called with columns of the right size. #63723 (Raúl Marín).
Add the ability to reshuffle rows during insert to optimize for size without violating the order set by PRIMARY KEY. It's controlled by the setting optimize_row_order (off by default). #63578 (Igor Markelov).
Add a native parquet reader, which can read parquet binary to ClickHouse Columns directly. It's controlled by the setting input_format_parquet_use_native_reader (disabled by default). #60361 (ZhiHong Zhang).
Support partial trivial count optimization when the query filter is able to select exact ranges from merge tree tables. #60463 (Amos Bird).
Reduce max memory usage of multithreaded INSERTs by collecting chunks of multiple threads in a single transform. #61047 (Yarik Briukhovetskyi).
Reduce the memory usage when using Azure object storage by using fixed memory allocation, avoiding the allocation of an extra buffer. #63160 (SmitaRKulkarni).
Reduce the number of virtual function calls in ColumnNullable::size. #60556 (HappenLee).
Speedup splitByRegexp when the regular expression argument is a single-character. #62696 (Robert Schulze).
Speed up aggregation by 8-bit and 16-bit keys by keeping track of the min and max keys used. This allows to reduce the number of cells that need to be verified. #62746 (Jiebin Sun).
Optimize operator IN when the left hand side is LowCardinality and the right is a set of constants. #64060 (Zhiguo Zhou).
Use a thread pool to initialize and destroy hash tables inside ConcurrentHashJoin. #64241 (Nikita Taranov).
Optimized vertical merges in tables with sparse columns. #64311 (Anton Popov).
Enabled prefetches of data from remote filesystem during vertical merges. It improves latency of vertical merges in tables with data stored on remote filesystem. #64314 (Anton Popov).
Reduce redundant calls to isDefault of ColumnSparse::filter to improve performance. #64426 (Jiebin Sun).
Speedup find_super_nodes and find_big_family keeper-client commands by making multiple asynchronous getChildren requests. #64628 (Alexander Gololobov).
Improve function least/greatest for nullable numberic type arguments. #64668 (KevinyhZou).
Allow merging two consequent filtering steps of a query plan. This improves filter-push-down optimization if the filter condition can be pushed down from the parent step. #64760 (Nikolai Kochetov).
Remove bad optimization in the vertical final implementation and re-enable vertical final algorithm by default. #64783 (Duc Canh Le).
Remove ALIAS nodes from the filter expression. This slightly improves performance for queries with PREWHERE (with the new analyzer). #64793 (Nikolai Kochetov).
Re-enable OpenSSL session caching. #65111 (Robert Schulze).
Added settings to disable materialization of skip indexes and statistics on inserts (materialize_skip_indexes_on_insert and materialize_statistics_on_insert). #64391 (Anton Popov).
Use the allocated memory size to calculate the row group size and reduce the peak memory of the parquet writer in the single-threaded mode. #64424 (LiuNeng).
Improve the iterator of sparse column to reduce call of size. #64497 (Jiebin Sun).
Update condition to use server-side copy for backups to Azure blob storage. #64518 (SmitaRKulkarni).
Optimized memory usage of vertical merges for tables with high number of skip indexes. #64580 (Anton Popov).

Improvement

Returned back the behaviour of how ClickHouse works and interprets Tuples in CSV format. This change effectively reverts ClickHouse/ClickHouse#60994 and makes it available only under a few settings: output_format_csv_serialize_tuple_into_separate_columns, input_format_csv_deserialize_separate_columns_into_tuple and input_format_csv_try_infer_strings_from_quoted_tuples. #65170 (Nikita Mikhaylov).
SHOW CREATE TABLE executed on top of system tables will now show the super handy comment unique for each table which will explain why this table is needed. #63788 (Nikita Mikhaylov).
The second argument (scale) of functions round(), roundBankers(), floor(), ceil() and trunc() can now be non-const. #64798 (Mikhail Gorshkov).
Avoid possible deadlock during MergeTree index analysis when scheduling threads in a saturated service. #59427 (Sean Haynes).
Several minor corner case fixes to S3 proxy support & tunneling. #63427 (Arthur Passos).
Add metrics to track the number of directories created and removed by the plain_rewritable metadata storage, and the number of entries in the local-to-remote in-memory map. #64175 (Julia Kartseva).
The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g. limit or additional_table_filters) would affect the query result. #64205 (Robert Schulze).
Support the non standard error code QpsLimitExceeded in object storage as a retryable error. #64225 (Sema Checherinda).
Added a new setting input_format_parquet_prefer_block_bytes to control the average output block bytes, and modified the default value of input_format_parquet_max_block_size to 65409. #64427 (LiuNeng).
Settings from the user's config don't affect merges and mutations for MergeTree on top of object storage. #64456 (alesapin).
Support the non standard error code TotalQpsLimitExceeded in object storage as a retryable error. #64520 (Sema Checherinda).
Updated Advanced Dashboard for both open-source and ClickHouse Cloud versions to include a chart for 'Maximum concurrent network connections'. #64610 (Thom O'Connor).
Improve progress report on zeros_mt and generateRandom. #64804 (Raúl Marín).
Add an asynchronous metric jemalloc.profile.active to show whether sampling is currently active. This is an activation mechanism in addition to prof.active; both must be active for the calling thread to sample. #64842 (Unalian).
Remove mark of allow_experimental_join_condition as important. This mark may have prevented distributed queries in a mixed versions cluster from being executed successfully. #65008 (Nikita Mikhaylov).
Added server Asynchronous metrics DiskGetObjectThrottler* and DiskGetObjectThrottler* reflecting request per second rate limit defined with s3_max_get_rps and s3_max_put_rps disk settings and currently available number of requests that could be sent without hitting throttling limit on the disk. Metrics are defined for every disk that has a configured limit. #65050 (Sergei Trifonov).
Add a validation when creating a user with bcrypt_hash. #65242 (Raúl Marín).
Add profile events for number of rows read during/after PREWHERE. #64198 (Nikita Taranov).
Print query in EXPLAIN PLAN with parallel replicas. #64298 (vdimir).
Rename allow_deprecated_functions to allow_deprecated_error_prone_window_functions. #64358 (Raúl Marín).
Respect max_read_buffer_size setting for file descriptors as well in the file table function. #64532 (Azat Khuzhin).
Disable transactions for unsupported storages even for materialized views. #64918 (alesapin).
Forbid QUALIFY clause in the old analyzer. The old analyzer ignored QUALIFY, so it could lead to unexpected data removal in mutations. #65356 (Dmitry Novik).

Bug Fix (user-visible misbehavior in an official stable release)

Fixed 'set' skip index not working with IN and indexHint(). #62083 (Michael Kolupaev).
Fix queries with FINAL give wrong result when table does not use adaptive granularity. #62432 (Duc Canh Le).
Support executing function during assignment of parameterized view value. #63502 (SmitaRKulkarni).
Fixed parquet memory tracking. #63584 (Michael Kolupaev).
Fix rare case with missing data in the result of distributed query. #63691 (vdimir).
Fixed reading of columns of type Tuple(Map(LowCardinality(String), String), ...). #63956 (Anton Popov).
Fix resolve of unqualified COLUMNS matcher. Preserve the input columns order and forbid usage of unknown identifiers. #63962 (Dmitry Novik).
Fix an Cyclic aliases error for cyclic aliases of different type (expression and function). #63993 (Nikolai Kochetov).
This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline. #64079 (pufit).
Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. #64096 (Yakov Olkhovskiy).
Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. #64174 (Raúl Marín).
The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. #64199 (Robert Schulze).
Fix possible abort on uncaught exception in ~WriteBufferFromFileDescriptor in StatusFile. #64206 (Kruglov Pavel).
Fix duplicate alias error for distributed queries with ARRAY JOIN. #64226 (Nikolai Kochetov).
Fix unexpected accurateCast from string to integer. #64255 (wudidapaopao).
Fixed CNF simplification, in case any OR group contains mutually exclusive atoms. #64256 (Eduard Karacharov).
Fix Query Tree size validation. #64377 (Dmitry Novik).
Fix Logical error: Bad cast for Buffer table with PREWHERE. #64388 (Nikolai Kochetov).
Fixed CREATE TABLE AS queries for tables with default expressions. #64455 (Anton Popov).
Fixed optimize_read_in_order behaviour for ORDER BY ... NULLS FIRST / LAST on tables with nullable keys. #64483 (Eduard Karacharov).
Fix the Expression nodes list expected 1 projection names and Unknown expression or identifier errors for queries with aliases to GLOBAL IN.. #64517 (Nikolai Kochetov).
Fix an error Cannot find column in distributed queries with constant CTE in the GROUP BY key. #64519 (Nikolai Kochetov).
Fix the output of function formatDateTimeInJodaSyntax when a formatter generates an uneven number of characters and the last character is 0. For example, SELECT formatDateTimeInJodaSyntax(toDate('2012-05-29'), 'D') now correctly returns 150 instead of previously 15. #64614 (LiuNeng).
Do not rewrite aggregation if -If combinator is already used. #64638 (Dmitry Novik).
Fix type inference for float (in case of small buffer, i.e. --max_read_buffer_size 1). #64641 (Azat Khuzhin).
Fix bug which could lead to non-working TTLs with expressions. #64694 (alesapin).
Fix removing the WHERE and PREWHERE expressions, which are always true (for the new analyzer). #64695 (Nikolai Kochetov).
Fixed excessive part elimination by token-based text indexes (ngrambf , full_text) when filtering by result of startsWith, endsWith, match, multiSearchAny. #64720 (Eduard Karacharov).
Fixes incorrect behaviour of ANSI CSI escaping in the UTF8::computeWidth function. #64756 (Shaun Struwig).
Fix a case of incorrect removal of ORDER BY / LIMIT BY across subqueries. #64766 (Raúl Marín).
Fix (experimental) unequal join with subqueries for sets which are in the mixed join conditions. #64775 (lgbo).
Fix crash in a local cache over plain_rewritable disk. #64778 (Julia Kartseva).
Fix Cannot find column in distributed query with ARRAY JOIN by Nested column. Fixes #64755. #64801 (Nikolai Kochetov).
Fix memory leak in slru cache policy. #64803 (Kseniia Sumarokova).
Fixed possible incorrect memory tracking in several kinds of queries: queries that read any data from S3, queries via http protocol, asynchronous inserts. #64844 (Anton Popov).
Fix the Block structure mismatch error for queries reading with PREWHERE from the materialized view when the materialized view has columns of different types than the source table. Fixes #64611. #64855 (Nikolai Kochetov).
Fix rare crash when table has TTL with subquery + database replicated + parallel replicas + analyzer. It's really rare, but please don't use TTLs with subqueries. #64858 (alesapin).
Fix ALTER MODIFY COMMENT query that was broken for parameterized VIEWs in ClickHouse/ClickHouse#54211. #65031 (Nikolay Degterinsky).
Fix host_id in DatabaseReplicated when cluster_secure_connection parameter is enabled. Previously all the connections within the cluster created by DatabaseReplicated were not secure, even if the parameter was enabled. #65054 (Nikolay Degterinsky).
Fixing the Not-ready Set error after the PREWHERE optimization for StorageMerge. #65057 (Nikolai Kochetov).
Avoid writing to finalized buffer in File-like storages. #65063 (Kruglov Pavel).
Fix possible infinite query duration in case of cyclic aliases. Fixes #64849. #65081 (Nikolai Kochetov).
Fix the Unknown expression identifier error for remote queries with INTERPOLATE (alias) (new analyzer). Fixes #64636. #65090 (Nikolai Kochetov).
Fix pushing arithmetic operations out of aggregation. In the new analyzer, optimization was applied only once. #65104 (Dmitry Novik).
Fix aggregate function name rewriting in the new analyzer. #65110 (Dmitry Novik).
Respond with 5xx instead of 200 OK in case of receive timeout while reading (parts of) the request body from the client socket. #65118 (Julian Maicher).
Fix possible crash for hedged requests. #65206 (Azat Khuzhin).
Fix the bug in Hashed and Hashed_Array dictionary short circuit evaluation, which may read uninitialized number, leading to various errors. #65256 (jsc0218).
This PR ensures that the type of the constant(IN operator's second parameter) is always visible during the IN operator's type conversion process. Otherwise, losing type information may cause some conversions to fail, such as the conversion from DateTime to Date. fix (#64487). #65315 (pn).

v24.6 Changelog for Cloud

Backward Incompatible Change​

New Feature​

Performance Improvement​

Improvement​

Bug Fix (user-visible misbehavior in an official stable release)​

Backward Incompatible Change

New Feature

Performance Improvement

Improvement

Bug Fix (user-visible misbehavior in an official stable release)