26.3 Antalya port - Alternative syntax for cluster functions#1640
Conversation
…tive_syntax 26.1 Antalya port - Alternative syntax for cluster functions
…e_for_partitioning 26.1 Antalya port - Timezone for partitioning
…nquoted_timezone Fix unquoted timezone in sorting key
…_initiator Antalya 26.1: Fix remote initiator host name
…itiator_improvements Antalya 26.1: Remote initiator improvements
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 88842054aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pure_storage->renameInMemory(new_table_id); | ||
| return; |
There was a problem hiding this comment.
Update wrapper storage ID during renameInMemory
StorageObjectStorageCluster::renameInMemory returns after renaming only pure_storage, but pure_storage is always present, so the wrapper's own storage_id is never updated. After RENAME TABLE, calls that read StorageObjectStorageCluster::getStorageID() can still see the old identifier, which can break catalog/dependency logic that relies on the storage object reporting its current name.
Useful? React with 👍 / 👎.
| static pcg64 rng(randomSeed()); | ||
| size_t shard_num = rng() % host_addresses.size(); |
There was a problem hiding this comment.
Use thread-safe RNG when choosing remote initiator host
The function keeps a process-wide static pcg64 and advances it on each call. When multiple queries use object_storage_remote_initiator concurrently, they race on this shared RNG state, which is undefined behavior in C++. This can lead to flaky host selection and hard-to-diagnose concurrency issues; use thread_local RNG or guard access with synchronization.
Useful? React with 👍 / 👎.
…rser' into frontport/antalya-26.3/alternative_syntax
|
Test |
| /// controls new feature and it's 'true' by default, use 'false' as previous_value). | ||
| /// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972) | ||
| /// Note: please check if the key already exists to prevent duplicate entries. | ||
| addSettingsChanges(settings_changes_history, "26.3.1.20001.altinityantalya", |
There was a problem hiding this comment.
I am making an update for our settingschangeshistory.
I will later ask you to move all your settings to a corresponding section: whether they appeared in 25.8, in 26.1 or in any other version. It is not correct to put them all into the latest release block, we won't do this anymore
There was a problem hiding this comment.
It will have commented lines for all settings that existed in 26.1. In future, you will add new settings, but for forward-ports you will uncomment corresponding lines
|
|
…itiator_improvements_2 Antalya 26.1; Remote initiator improvements 2
Verification report: Altinity/ClickHouse PR #1640ConclusionPR is merged. CI red on head, but every failure is either a pre-existing flake or a regression-suite scenario broken at baseline. No PR-caused regression found. The high regression-DB fail count (≈23%) is misleading — it is dominated by suites that already fail ~50–80% on every PR (swarms, parquet/postgresql, export-partition sanity).
CI on head
|
| Check | Test FAIL | Class |
|---|---|---|
Stateless tests (amd_debug, sequential) |
00157_cache_dictionary |
Pre-existing flake — 106 fails / 25 PRs / since 2026-03-02 |
Stateless tests (arm_asan, azure, sequential, 2/2) |
03443_shared_storage_snapshots |
Pre-existing flake — 30 fails / 18 PRs / since 2026-02-14 |
Stress test (arm_asan) |
Test script failed, Possible deadlock on shutdown (see gdb.log) |
Pre-existing stress instability — 10 / 7 PRs and 6 / 5 PRs respectively, since 2026-04-13 |
Regression workflow (8 failed checks)
| Check | Top failing test on PR-1640 builds (30d) | Baseline fail rate (30d, all PRs) | Class |
|---|---|---|---|
RegressionTestsRelease / Swarms + Aarch64 / Swarms |
/swarms/feature/node failure/initiator out of disk space (×24) |
78% (133 OK / 483 Fail) | Pre-existing broken |
| same | /swarms/feature/swarm joins/join clause (×24) |
49% (305 OK / 293 Fail) | Pre-existing flaky |
RegressionTestsRelease / S3Export (partition) + Aarch64 |
/s3/minio/export tests/export partition/sanity/basic table (×24) |
50% (282 OK / 286 Fail) | Pre-existing broken |
| same | …/sanity/replicated partition exports local mode peer replica (×24) |
60% (210 OK / 311 Fail) | Pre-existing broken |
| same | …/sanity/partition export tight pool lock inside task (×24) |
59% (212 OK / 309 Fail) | Pre-existing broken |
RegressionTestsRelease / S3Export (part) + Aarch64 |
/s3 suite-level (×14 fails total) |
flaky | Pre-existing flaky |
RegressionTestsRelease / Parquet + Aarch64 |
/parquet/postgresql/compression type/=GZIP etc. (×24 each) |
36% (649 OK / 360 Fail per case) | Pre-existing flaky |
Regression DB on /PRs/1640/ builds (30d): 22,673 Fail / 73,151 OK ≈ 23.6% — looks alarming but is entirely accounted for by the per-test baseline rates above (swarms join-clause alone enumerates >800k parameter combinations and inherits ~50% baseline fail rate, so it dominates the absolute count). No regression test fails on PR-1640 above its all-PR baseline.
Audit Report — PR #1640ScopeThis PR is a frontport bundle carrying forward the following feature groups from
~120 files changed, ~11 000 diff lines. Reviewed partitions: cluster-dispatch core, fallback table function mechanism, remote-initiator host-selection, S3 URI dedup fix, REST catalog namespace filter, Confirmed DefectsHIGH —
|
| Anchor | StorageObjectStorageCluster.cpp / updateQueryForDistributedEngineIfNeeded → cluster_name_in_settings = true (line 418); read at lines 453, 467 in updateQueryToSendIfNeeded |
| Impact | cluster_name_in_settings is an instance field initialized to false and set to true during the first cluster query, but never reset. All subsequent non-cluster reads on the same persistent storage object (CREATE TABLE … ENGINE=…) observe stale true, causing wrong branch selection in updateQueryToSendIfNeeded: wrong flag passed to extractDynamicStorageType, wrong argument construction for the rewritten query AST. Concurrent reads also data-race on the unsynchronized bool (C++ UB). |
| Trigger | Any CREATE TABLE … ENGINE=IcebergS3(…) (or any other StorageObjectStorageCluster engine) that receives at least one cluster query (object_storage_cluster set), followed by any remote-initiator query (object_storage_remote_initiator=1) on the same table object — or two concurrent queries of different types. |
| Why defect | Per-call state (whether the cluster name was injected into the query SETTINGS clause during this specific rewrite) was stored as a per-object field, conflating call-scoped state with object-scoped state. |
| Fix direction | Make updateQueryForDistributedEngineIfNeeded return bool instead of writing to the instance field. In updateQueryToSendIfNeeded, use a local effective_cluster_name_in_settings = cluster_name_in_settings || cluster_name_was_set. The instance field should only be written at construction time (via setClusterNameInSettings), which is the legitimate use for table functions. |
| Regression test direction | Two concurrent SELECT queries on the same StorageObjectStorageCluster table, one with and one without object_storage_cluster; verify neither produces wrong results or a LOGICAL_ERROR. |
MEDIUM — Null dereference in remote-initiator path with object_storage_cluster_join_mode='local' + allow_experimental_analyzer=0
Status: Open.
| Anchor | IStorageCluster.cpp / IStorageCluster::read, remote-initiator branch (line 330): updateQueryWithJoinToSendIfNeeded(query_to_send, query_info.query_tree, context) |
| Impact | Server exception — query_tree->as<QueryNode &>() on a null pointer. |
| Trigger | object_storage_remote_initiator=1 + object_storage_cluster_join_mode='local' + allow_experimental_analyzer=0 + query with a JOIN. The cluster path guards this combination via IStorageCluster::getQueryProcessingStage (throws NOT_IMPLEMENTED before read is reached). The remote-initiator path (cluster_name_from_settings.empty()) returns FetchColumns early from StorageObjectStorageCluster::getQueryProcessingStage and never reaches that guard. |
| Why defect | The remote-initiator branch calls updateQueryWithJoinToSendIfNeeded unconditionally, while the existing guard only fires in the cluster branch. |
| Fix direction | Either add the !allow_experimental_analyzer check to the remote-initiator call site (throw NOT_IMPLEMENTED for the same combination), or null-check query_tree before entering getQueryTreeInfo. |
| Regression test direction | Query with JOIN on a cluster-fallback table function with the three settings above; expect a clear NOT_IMPLEMENTED exception, not a null-deref. |
LOW — engine_to_function and function_to_cluster_function maps are out of sync
Status: Open.
| Anchor | StorageObjectStorageCluster.cpp / updateQueryForDistributedEngineIfNeeded (engine_to_function static map) and updateQueryToSendIfNeeded (function_to_cluster_function static map) |
| Impact | LOGICAL_ERROR at query time for engine types missing from one of the two maps (e.g. DeltaLakeLocal is in engine_to_function with no corresponding cluster entry in function_to_cluster_function; COSN/GCS/OSS are in engine_to_function but absent from function_to_cluster_function). |
| Trigger | CREATE TABLE … ENGINE=DeltaLakeLocal(…) with object_storage_cluster set, or any other engine whose name appears in one map but not the other. |
| Why defect | The two complementary static maps are maintained independently; adding a new engine type to one without updating the other silently creates a LOGICAL_ERROR path at query time. |
| Fix direction | Replace the two maps with a single bidirectional lookup table (or add a startup/compile-time assertion that both directions are complete for every registered engine). |
| Regression test direction | SELECT with object_storage_cluster against a table of each engine type in both maps; verify no LOGICAL_ERROR. |
Previously Flagged Defects (Codex P1/P2) — Both Confirmed Fixed
| Codex finding | Fix status |
|---|---|
P1 — StorageObjectStorageCluster::renameInMemory returned after renaming only pure_storage, never calling IStorageCluster::renameInMemory; wrapper storage_id was never updated. |
Fixed — current code calls IStorageCluster::renameInMemory(new_table_id) after pure_storage->renameInMemory(new_table_id). |
P2 — static pcg64 rng(randomSeed()) in host-selection path was shared across threads (data race / UB). |
Fixed — changed to a local pcg64 rng(randomSeed()) in IStorageCluster::convertToRemote; no shared state. |
Coverage Summary
| Scope reviewed | IStorageCluster.cpp/h, StorageObjectStorageCluster.cpp/h, TableFunctionObjectStorageClusterFallback.cpp/h, TableFunctionObjectStorageCluster.cpp, StorageFileCluster.cpp, StorageURLCluster.cpp, IStorageCluster::convertToRemote (remote-initiator host selection), S3/URI.cpp (region dedup), RestCatalog.cpp (namespace filter), SettingsChangesHistory.cpp |
| C++ bug classes covered | Memory lifetime ✓, RAII/resource leaks ✓, data races ✓ (found one, open), integer overflow ✓, credentials in logs ✓ (host name only, no secrets), iterator invalidation ✓ |
| Categories failed | Shared-state / data-race (cluster_name_in_settings), null-safety under non-default setting combination (query_tree + join mode), map completeness (engine_to_function vs function_to_cluster_function) |
| Categories passed | Memory lifetime (no use-after-free/move found), RAII (pure_storage always valid before use), static-local-map init (thread-safe C++11), URI region idempotency logic, renameInMemory fix, static-RNG fix |
| Assumptions / limits | DataLake catalog namespace filter and Timezone-for-partitioning partitions reviewed at diff level only; ICatalog/HivePartitioningUtils deep class hierarchy not fully traced. Integration and stateless test changes reviewed for coverage gaps only. |
It's correct. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Frontports for Antalya 26.3
CI/CD Options
Exclude tests:
Regression jobs to run: