26.3 Antalya port - Alternative syntax for cluster functions by ianton-ru · Pull Request #1640 · Altinity/ClickHouse

ianton-ru · 2026-04-13T13:01:27Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Frontports for Antalya 26.3

CI/CD Options

Exclude tests:

Regression jobs to run:

…tive_syntax 26.1 Antalya port - Alternative syntax for cluster functions

github-actions · 2026-04-13T13:02:36Z

Workflow [PR], commit [48ff375]

…e_for_partitioning 26.1 Antalya port - Timezone for partitioning

…nquoted_timezone Fix unquoted timezone in sorting key

…_initiator Antalya 26.1: Fix remote initiator host name

…itiator_improvements Antalya 26.1: Remote initiator improvements

ianton-ru · 2026-04-14T07:18:07Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88842054aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T07:40:18Z

+        pure_storage->renameInMemory(new_table_id);
+        return;


Update wrapper storage ID during renameInMemory

StorageObjectStorageCluster::renameInMemory returns after renaming only pure_storage, but pure_storage is always present, so the wrapper's own storage_id is never updated. After RENAME TABLE, calls that read StorageObjectStorageCluster::getStorageID() can still see the old identifier, which can break catalog/dependency logic that relies on the storage object reporting its current name.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-14T07:40:18Z

+    static pcg64 rng(randomSeed());
+    size_t shard_num = rng() % host_addresses.size();


Use thread-safe RNG when choosing remote initiator host

The function keeps a process-wide static pcg64 and advances it on each call. When multiple queries use object_storage_remote_initiator concurrently, they race on this shared RNG state, which is undefined behavior in C++. This can lead to flaky host selection and hard-to-diagnose concurrency issues; use thread_local RNG or guard access with synchronization.

Useful? React with 👍 / 👎.

…rser' into frontport/antalya-26.3/alternative_syntax

ianton-ru · 2026-04-15T10:24:43Z

Test test_s3_cluster/test.py::test_object_storage_remote_initiator is depended on #1646

zvonand · 2026-04-15T14:31:19Z

        /// controls new feature and it's 'true' by default, use 'false' as previous_value).
        /// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972)
        /// Note: please check if the key already exists to prevent duplicate entries.
+        addSettingsChanges(settings_changes_history, "26.3.1.20001.altinityantalya",


I am making an update for our settingschangeshistory.

I will later ask you to move all your settings to a corresponding section: whether they appeared in 25.8, in 26.1 or in any other version. It is not correct to put them all into the latest release block, we won't do this anymore

It will have commented lines for all settings that existed in 26.1. In future, you will add new settings, but for forward-ports you will uncomment corresponding lines

This reverts commit c8473ee.

…yntax

zvonand · 2026-05-01T11:51:07Z

test_s3_cluster are failing -- looks like it is related to the PR's subject (@ianton-ru)

…itiator_improvements_2 Antalya 26.1; Remote initiator improvements 2

alsugiliazova · 2026-05-11T12:46:21Z

Verification report: Altinity/ClickHouse PR #1640

Conclusion

PR is merged. CI red on head, but every failure is either a pre-existing flake or a regression-suite scenario broken at baseline. No PR-caused regression found. The high regression-DB fail count (≈23%) is misleading — it is dominated by suites that already fail ~50–80% on every PR (swarms, parquet/postgresql, export-partition sanity).

Caveat — partial frontport. PR #1640 is a frontport bundle that carries forward only a subset of the related antalya-26.1 features (alternative cluster-function syntax, timezone for partitioning, unquoted timezone fix, remote initiator host name fix, remote initiator improvements). Downstream features that depend on companion PRs not yet frontported to antalya-26.3 (e.g., remote-initiator improvements 2 from #1608/#1638, the CREATE TABLE from table function fix from #1701, the system.databases data-lake-catalog fix from #1690, and any export-partition / parquet plumbing landed only on 26.1) may cause regression failures on antalya-26.3 that look related to this PR but are actually missing-dependency symptoms. The baseline-rate analysis below already confirms that swarms / parquet / export-partition fail at the same rate on unrelated PRs against antalya-26.3, so the failure pattern is consistent with the branch lacking those companion frontports, not with this PR introducing breakage. Re-verify once the dependency frontports land.

CI on head `48ff375d` — failures

PR test workflow (3 failed checks, 42 success)

Check	Test FAIL	Class
`Stateless tests (amd_debug, sequential)`	`00157_cache_dictionary`	Pre-existing flake — 106 fails / 25 PRs / since 2026-03-02
`Stateless tests (arm_asan, azure, sequential, 2/2)`	`03443_shared_storage_snapshots`	Pre-existing flake — 30 fails / 18 PRs / since 2026-02-14
`Stress test (arm_asan)`	`Test script failed`, `Possible deadlock on shutdown (see gdb.log)`	Pre-existing stress instability — 10 / 7 PRs and 6 / 5 PRs respectively, since 2026-04-13

Regression workflow (8 failed checks)

Check	Top failing test on PR-1640 builds (30d)	Baseline fail rate (30d, all PRs)	Class
`RegressionTestsRelease / Swarms` + `Aarch64 / Swarms`	`/swarms/feature/node failure/initiator out of disk space` (×24)	78% (133 OK / 483 Fail)	Pre-existing broken
same	`/swarms/feature/swarm joins/join clause` (×24)	49% (305 OK / 293 Fail)	Pre-existing flaky
`RegressionTestsRelease / S3Export (partition)` + `Aarch64`	`/s3/minio/export tests/export partition/sanity/basic table` (×24)	50% (282 OK / 286 Fail)	Pre-existing broken
same	`…/sanity/replicated partition exports local mode peer replica` (×24)	60% (210 OK / 311 Fail)	Pre-existing broken
same	`…/sanity/partition export tight pool lock inside task` (×24)	59% (212 OK / 309 Fail)	Pre-existing broken
`RegressionTestsRelease / S3Export (part)` + `Aarch64`	`/s3` suite-level (×14 fails total)	flaky	Pre-existing flaky
`RegressionTestsRelease / Parquet` + `Aarch64`	`/parquet/postgresql/compression type/=GZIP` etc. (×24 each)	36% (649 OK / 360 Fail per case)	Pre-existing flaky

Regression DB on /PRs/1640/ builds (30d): 22,673 Fail / 73,151 OK ≈ 23.6% — looks alarming but is entirely accounted for by the per-test baseline rates above (swarms join-clause alone enumerates >800k parameter combinations and inherits ~50% baseline fail rate, so it dominates the absolute count). No regression test fails on PR-1640 above its all-PR baseline.

alsugiliazova · 2026-05-11T12:59:44Z

Audit Report — PR #1640

Scope

This PR is a frontport bundle carrying forward the following feature groups from antalya-26.1 to antalya-26.3:

Group	Key PRs
Alternative syntax for cluster functions (object storage)	#1390, #592, #615, #648, #675, #677, #712, #710, #719, #746, #750, #761, #1074, #1120, #1137, #1234
Remote initiator feature	#756, #1551, #1577, #1608
Iceberg improvements (timezone, profile metrics, prune)	#898, #1103, #1123, #770, #822, #860
DataLakeCatalog namespace filter	#1455, #1337
Timezone for partitioning	#1453
Unquoted timezone fix in sorting key	#1526

~120 files changed, ~11 000 diff lines. Reviewed partitions: cluster-dispatch core, fallback table function mechanism, remote-initiator host-selection, S3 URI dedup fix, REST catalog namespace filter, SettingsChangesHistory, StorageFileCluster/StorageURLCluster signature changes.

Confirmed Defects

HIGH — `cluster_name_in_settings`: persistent shared mutable state corrupts query rewriting

Status: Open.


Anchor	`StorageObjectStorageCluster.cpp` / `updateQueryForDistributedEngineIfNeeded` → `cluster_name_in_settings = true` (line 418); read at lines 453, 467 in `updateQueryToSendIfNeeded`
Impact	`cluster_name_in_settings` is an instance field initialized to `false` and set to `true` during the first cluster query, but never reset. All subsequent non-cluster reads on the same persistent storage object (`CREATE TABLE … ENGINE=…`) observe stale `true`, causing wrong branch selection in `updateQueryToSendIfNeeded`: wrong flag passed to `extractDynamicStorageType`, wrong argument construction for the rewritten query AST. Concurrent reads also data-race on the unsynchronized `bool` (C++ UB).
Trigger	Any `CREATE TABLE … ENGINE=IcebergS3(…)` (or any other `StorageObjectStorageCluster` engine) that receives at least one cluster query (`object_storage_cluster` set), followed by any remote-initiator query (`object_storage_remote_initiator=1`) on the same table object — or two concurrent queries of different types.
Why defect	Per-call state (whether the cluster name was injected into the query `SETTINGS` clause during this specific rewrite) was stored as a per-object field, conflating call-scoped state with object-scoped state.
Fix direction	Make `updateQueryForDistributedEngineIfNeeded` return `bool` instead of writing to the instance field. In `updateQueryToSendIfNeeded`, use a local `effective_cluster_name_in_settings = cluster_name_in_settings \|\| cluster_name_was_set`. The instance field should only be written at construction time (via `setClusterNameInSettings`), which is the legitimate use for table functions.
Regression test direction	Two concurrent `SELECT` queries on the same `StorageObjectStorageCluster` table, one with and one without `object_storage_cluster`; verify neither produces wrong results or a `LOGICAL_ERROR`.

MEDIUM — Null dereference in remote-initiator path with `object_storage_cluster_join_mode='local'` + `allow_experimental_analyzer=0`

Status: Open.


Anchor	`IStorageCluster.cpp` / `IStorageCluster::read`, remote-initiator branch (line 330): `updateQueryWithJoinToSendIfNeeded(query_to_send, query_info.query_tree, context)`
Impact	Server exception — `query_tree->as<QueryNode &>()` on a null pointer.
Trigger	`object_storage_remote_initiator=1` + `object_storage_cluster_join_mode='local'` + `allow_experimental_analyzer=0` + query with a JOIN. The cluster path guards this combination via `IStorageCluster::getQueryProcessingStage` (throws `NOT_IMPLEMENTED` before `read` is reached). The remote-initiator path (`cluster_name_from_settings.empty()`) returns `FetchColumns` early from `StorageObjectStorageCluster::getQueryProcessingStage` and never reaches that guard.
Why defect	The remote-initiator branch calls `updateQueryWithJoinToSendIfNeeded` unconditionally, while the existing guard only fires in the cluster branch.
Fix direction	Either add the `!allow_experimental_analyzer` check to the remote-initiator call site (throw `NOT_IMPLEMENTED` for the same combination), or null-check `query_tree` before entering `getQueryTreeInfo`.
Regression test direction	Query with JOIN on a cluster-fallback table function with the three settings above; expect a clear `NOT_IMPLEMENTED` exception, not a null-deref.

LOW — `engine_to_function` and `function_to_cluster_function` maps are out of sync

Status: Open.


Anchor	`StorageObjectStorageCluster.cpp` / `updateQueryForDistributedEngineIfNeeded` (`engine_to_function` static map) and `updateQueryToSendIfNeeded` (`function_to_cluster_function` static map)
Impact	`LOGICAL_ERROR` at query time for engine types missing from one of the two maps (e.g. `DeltaLakeLocal` is in `engine_to_function` with no corresponding cluster entry in `function_to_cluster_function`; `COSN`/`GCS`/`OSS` are in `engine_to_function` but absent from `function_to_cluster_function`).
Trigger	`CREATE TABLE … ENGINE=DeltaLakeLocal(…)` with `object_storage_cluster` set, or any other engine whose name appears in one map but not the other.
Why defect	The two complementary static maps are maintained independently; adding a new engine type to one without updating the other silently creates a `LOGICAL_ERROR` path at query time.
Fix direction	Replace the two maps with a single bidirectional lookup table (or add a startup/compile-time assertion that both directions are complete for every registered engine).
Regression test direction	`SELECT` with `object_storage_cluster` against a table of each engine type in both maps; verify no `LOGICAL_ERROR`.

Previously Flagged Defects (Codex P1/P2) — Both Confirmed Fixed

Codex finding	Fix status
P1 — `StorageObjectStorageCluster::renameInMemory` returned after renaming only `pure_storage`, never calling `IStorageCluster::renameInMemory`; wrapper `storage_id` was never updated.	Fixed — current code calls `IStorageCluster::renameInMemory(new_table_id)` after `pure_storage->renameInMemory(new_table_id)`.
P2 — `static pcg64 rng(randomSeed())` in host-selection path was shared across threads (data race / UB).	Fixed — changed to a local `pcg64 rng(randomSeed())` in `IStorageCluster::convertToRemote`; no shared state.

Coverage Summary


Scope reviewed	`IStorageCluster.cpp/h`, `StorageObjectStorageCluster.cpp/h`, `TableFunctionObjectStorageClusterFallback.cpp/h`, `TableFunctionObjectStorageCluster.cpp`, `StorageFileCluster.cpp`, `StorageURLCluster.cpp`, `IStorageCluster::convertToRemote` (remote-initiator host selection), `S3/URI.cpp` (region dedup), `RestCatalog.cpp` (namespace filter), `SettingsChangesHistory.cpp`
C++ bug classes covered	Memory lifetime ✓, RAII/resource leaks ✓, data races ✓ (found one, open), integer overflow ✓, credentials in logs ✓ (host name only, no secrets), iterator invalidation ✓
Categories failed	Shared-state / data-race (`cluster_name_in_settings`), null-safety under non-default setting combination (`query_tree` + join mode), map completeness (`engine_to_function` vs `function_to_cluster_function`)
Categories passed	Memory lifetime (no use-after-free/move found), RAII (pure_storage always valid before use), static-local-map init (thread-safe C++11), URI region idempotency logic, `renameInMemory` fix, static-RNG fix
Assumptions / limits	DataLake catalog namespace filter and Timezone-for-partitioning partitions reviewed at diff level only; ICatalog/HivePartitioningUtils deep class hierarchy not fully traced. Integration and stateless test changes reviewed for coverage gaps only.

ianton-ru · 2026-05-12T12:17:44Z

LOW — engine_to_function and function_to_cluster_function maps are out of sync

It's correct.
deltaLakeLocal, cosn, gcs, oss do not have cluster versions.
paimon does not have table engine, only table function.
It may change in future.

zvonand and others added 2 commits April 10, 2026 16:50

Merge pull request #1390 from Altinity/frontport/antalya-26.1/alterna…

3ab54c8

…tive_syntax 26.1 Antalya port - Alternative syntax for cluster functions

Fix build after cherry-pick

a78bcd4

zvonand and others added 3 commits April 13, 2026 15:15

Merge pull request #1453 from Altinity/frontport/antalya-26.1/timezon…

9cf8394

…e_for_partitioning 26.1 Antalya port - Timezone for partitioning

Merge pull request #1526 from Altinity/bugfix/antalya-26.1/1487_fix_u…

17b07a5

…nquoted_timezone Fix unquoted timezone in sorting key

Fix fast tests

a33845c

ianton-ru changed the title ~~26.1 Antalya port - Alternative syntax for cluster functions~~ 26.3 Antalya port - Alternative syntax for cluster functions Apr 13, 2026

zvonand and others added 6 commits April 13, 2026 15:40

Merge pull request #1551 from Altinity/bugfix/antalya-26.1/fix_remote…

06df609

…_initiator Antalya 26.1: Fix remote initiator host name

Merge pull request #1577 from Altinity/feature/antalya-26.1/remote_in…

f50215b

…itiator_improvements Antalya 26.1: Remote initiator improvements

Merge antalya-26.3

b47e069

Merge version_parser

f52a41d

Fix settings history

59deeb2

More fix

8884205

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

ianton-ru added 2 commits April 14, 2026 09:57

Merge remote-tracking branch 'altinity/bugfix/antalya-26.3/version_pa…

57c5846

…rser' into frontport/antalya-26.3/alternative_syntax

Fix after AI review

d8a3993

svb-alt added antalya antalya-26.3 labels Apr 14, 2026

Fix configuration double update

41e54dd

ianton-ru added 2 commits April 15, 2026 14:06

Fix test_mask_sensitive_info multiple runs

a30f2c4

Fix multiple updates

cffce77

zvonand reviewed Apr 15, 2026

View reviewed changes

ianton-ru added 2 commits April 16, 2026 10:02

Merge antalya-26.3

d53e099

More fix for tests

7bf5abe

ianton-ru added the port-antalya PRs to be ported to all new Antalya releases label Apr 19, 2026

ianton-ru added 3 commits April 20, 2026 10:02

Revert "Fix version parsing for Antalya branch"

968d774

This reverts commit c8473ee.

Merge antalya-26.3

c2f4e0a

Fix settings history for antalya

3c28439

zvonand and others added 3 commits April 29, 2026 15:19

fix build

512cb15

Merge branch 'antalya-26.3' into frontport/antalya-26.3/alternative_s…

2d4074f

…yntax

Merge branch 'antalya-26.3' into frontport/antalya-26.3/alternative_s…

776f71e

…yntax

ianton-ru added 5 commits May 4, 2026 12:33

Merge antalya-26.3

3c866b0

Turn off test_object_storage_remote_initiator

f2c3b55

Fix system.tables

f3df3e9

Fix S3 schema hash

5822bc2

Merge antalya-26.3

e3c8a95

zvonand mentioned this pull request May 6, 2026

Antalya 26.3: 26.3 Antalya port - Alternative syntax for cluster functions #1750

Closed

27 tasks

ianton-ru and others added 3 commits May 7, 2026 09:30

Fix after merge

73f7575

Merge antalya-26.3

e220947

Merge pull request #1608 from Altinity/feature/antalya-26.1/remote_in…

48ff375

…itiator_improvements_2 Antalya 26.1; Remote initiator improvements 2

zvonand merged commit ff71e89 into antalya-26.3 May 7, 2026
290 of 313 checks passed

alsugiliazova added the verified Approved for release label May 11, 2026

alsugiliazova mentioned this pull request May 11, 2026

26.3 Antalya port - fixes for s3Cluster distributed calls #1646

Merged

27 tasks

alsugiliazova added the verified-with-issues Verified by QA and issues found. label May 11, 2026

alsugiliazova removed the verified Approved for release label May 12, 2026

ianton-ru mentioned this pull request May 12, 2026

Fix incorrect interaction between object_storage_remote_initiator and object_storage_cluster. #1783

Open

27 tasks

svb-alt added the forwardport This is a frontport of code that existed in previous Antalya versions label May 12, 2026

alsugiliazova added the verified Approved for release label May 14, 2026

		static pcg64 rng(randomSeed());
		size_t shard_num = rng() % host_addresses.size();

Conversation

ianton-ru commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianton-ru commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

ianton-ru commented Apr 15, 2026

Uh oh!

zvonand Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zvonand Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

ianton-ru Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

zvonand commented May 1, 2026

Uh oh!

Uh oh!

alsugiliazova commented May 11, 2026

Verification report: Altinity/ClickHouse PR #1640

Conclusion

CI on head 48ff375d — failures

PR test workflow (3 failed checks, 42 success)

Regression workflow (8 failed checks)

Uh oh!

alsugiliazova commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Audit Report — PR #1640

Scope

Confirmed Defects

HIGH — cluster_name_in_settings: persistent shared mutable state corrupts query rewriting

MEDIUM — Null dereference in remote-initiator path with object_storage_cluster_join_mode='local' + allow_experimental_analyzer=0

LOW — engine_to_function and function_to_cluster_function maps are out of sync

Previously Flagged Defects (Codex P1/P2) — Both Confirmed Fixed

Coverage Summary

Uh oh!

ianton-ru commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ianton-ru commented Apr 13, 2026 •

edited

Loading

github-actions Bot commented Apr 13, 2026 •

edited

Loading

zvonand Apr 15, 2026 •

edited

Loading

CI on head `48ff375d` — failures

alsugiliazova commented May 11, 2026 •

edited

Loading

HIGH — `cluster_name_in_settings`: persistent shared mutable state corrupts query rewriting

MEDIUM — Null dereference in remote-initiator path with `object_storage_cluster_join_mode='local'` + `allow_experimental_analyzer=0`

LOW — `engine_to_function` and `function_to_cluster_function` maps are out of sync

ianton-ru commented May 12, 2026 •

edited

Loading