Skip to content

Antalya 26.3 forward-port: Hybrid tables#1694

Merged
zvonand merged 3 commits into
antalya-26.3from
feature/antalya-26.3/pr-1442
Apr 29, 2026
Merged

Antalya 26.3 forward-port: Hybrid tables#1694
zvonand merged 3 commits into
antalya-26.3from
feature/antalya-26.3/pr-1442

Conversation

@zvonand
Copy link
Copy Markdown
Collaborator

@zvonand zvonand commented Apr 27, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

#1071
#1156
#1272 (#1442 by @filimonov).

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Cherry-picked from #1442.

…table

Antalya 26.1 forward-port: Hybrid tables
@zvonand zvonand added releasy Created/managed by RelEasy ai-resolved Port conflict auto-resolved by Claude labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Workflow [PR], commit [78de1e6]

@zvonand zvonand added antalya-26.3 forwardport This is a frontport of code that existed in previous Antalya versions labels Apr 27, 2026
The function `__aliasMarker` was registered with `Category::Other`,
which made `system.functions` report it as `'Other'` instead of
`'Internal'`. This broke `03921_internal_functions_have_internal_category`,
which asserts that every function with a `__` prefix has the Internal
category.

Addresses 2 failing test(s) in Fast test on
#1694.
@zvonand
Copy link
Copy Markdown
Collaborator Author

zvonand commented Apr 27, 2026

RelEasy analyze-fails — DONE

run completed at 2026-04-27T15:46:38Z

  • Head SHA: b2fed58ed8 (feature/antalya-26.3/pr-1442)
  • Tests considered: 2 across 1 CI shard(s)
  • Outcomes: 1 done · 0 partial · 0 unrelated · 0 unresolved
  • Commits added by AI: 1 (✅ pushed)
  • Anthropic cost: $3.1529

Per-shard outcomes

✅ DONE — Fast test

2 failed test(s) considered — +1 commit(s) — cost $3.1529
full report

AI narration

Build succeeded. Now let me run the failing test.
The fix works. All __ prefixed functions, including __aliasMarker, now have category Internal. Let me commit and then deal with the second test.

Summary

Root cause: PR #1442 added __aliasMarker, registered with FunctionDocumentation::Category::Other. The test 03921_internal_functions_have_internal_category asserts every function with the __ prefix has Category::Internal, so it failed (throwIf(notEquals(__table1.categories, 'Internal')).

Fix: in src/Functions/identity.cpp:50, changed .category = FunctionDocumentation::Category::Other to .category = FunctionDocumentation::Category::Internal. Verified via clickhouse local --query "SELECT name, categories FROM system.functions WHERE startsWith(name, '__')": __aliasMarker now reports Internal alongside the other internal __ functions.

The clickhouse-test entry is the umbrella job-status label that flips whenever any individual stateless test fails in the Fast test shard; it shares the same root cause.

  • 03921_internal_functions_have_internal_category [fixed]
  • clickhouse-test [fixed]

DONE

Summary

Root cause: PR #1442 added __aliasMarker, registered with FunctionDocumentation::Category::Other. The test 03921_internal_functions_have_internal_category asserts every function with the __ prefix has Category::Internal, so it failed (throwIf(notEquals(__table1.categories, 'Internal')).

Fix: in src/Functions/identity.cpp:50, changed .category = FunctionDocumentation::Category::Other to .category = FunctionDocumentation::Category::Internal. Verified via clickhouse local --query "SELECT name, categories FROM system.functions WHERE startsWith(name, '__')": __aliasMarker now reports Internal alongside the other internal __ functions.

The clickhouse-test entry is the umbrella job-status label that flips whenever any individual stateless test fails in the Fast test shard; it shares the same root cause.

  • 03921_internal_functions_have_internal_category [fixed]
  • clickhouse-test [fixed]

DONE


🤖 Posted automatically by releasy analyze-fails. Re-run the command to refresh.

@zvonand zvonand added the port-antalya PRs to be ported to all new Antalya releases label Apr 27, 2026
@zvonand zvonand changed the title Antalya 26.3: Antalya 26.1 forward-port: Hybrid tables Antalya 26.3 forward-port: Hybrid tables Apr 27, 2026
@zvonand zvonand requested review from filimonov and mkmkme April 28, 2026 20:28
@zvonand zvonand merged commit bdef614 into antalya-26.3 Apr 29, 2026
276 of 303 checks passed
@alsugiliazova
Copy link
Copy Markdown
Member

Verification report: Altinity/ClickHouse PR #1694

Conclusion

PR is merged. CI red on head, but every failure is either a pre-existing flake or a regression-suite scenario broken at baseline on antalya-26.3. No PR-caused regression found.

Caveat — partial frontport (same as #1640 / #1646). PR #1694 lives on antalya-26.3, which is still missing several companion frontports from antalya-26.1. The chronic regression failures observed here are branch-level missing-dependency symptoms, not breakage introduced by this PR. The iceberg sort key timezone test fails for the same UNRECOGNIZED_ARGUMENTS: '--iceberg_partition_timezone' reason documented in VERIFICATION_PR_1646.md. Re-verify once the companion frontports land.


CI on head 78de1e6d — failures

PR test workflow (2 failed checks, 42 success, 1 cancelled)

Check Test FAIL Class
Stateless tests (amd_debug, sequential) 00157_cache_dictionary Pre-existing flake — 106 fails / 25 PRs / 90d
Stateless tests (arm_asan, azure, parallel, 2/4) 00084_external_aggregation Pre-existing flake — 28 / 12

Regression workflow (10 failed checks)

Check Top failing tests on PR-1694 builds (30d) Baseline (antalya-26.3, 30d) Class
Swarms (Release + Aarch64) cluster discovery, swarm join sanity / join with clause, node failure / check restart, node failure / cpu overload, swarm sanity / check scale up and down, node failure / network failure (×6 each) 30–44% on every PR Pre-existing broken
S3Export (partition) (Release + Aarch64) export partition / sanity / no partition by (×6) 50% Pre-existing broken
Iceberg (1) (Release + Aarch64) rest catalog / sort key timezone / month transform utc (×6), rest catalog / iceberg iterator race condition (×6) 41% / 28% Missing-dep symptom + pre-existing flaky
Iceberg (2) (Release + Aarch64) glue catalog / iceberg iterator race condition 28% Pre-existing flaky
Parquet (Release + Aarch64) postgresql/mysql round-trip compression-type variants (×6 each) ~36% Pre-existing flaky

Regression DB on /PRs/1694/ builds (30d): 449 Fail / 13,798 OK ≈ 3.2%. Every top failure matches the all-PR baseline fail rate on antalya-26.3.


Related to PR diff?

PR is the forward-port of Hybrid tables (upstream #1442) — touches Storages/Hybrid/* and related plumbing.

Failing test Diff overlap Related?
00157_cache_dictionary, 00084_external_aggregation none (dictionary cache, GROUP BY spilling) No
swarms / * none (swarm cluster discovery / node-failure scenarios), failures hit chronic baseline rates No
s3_export_partition / no partition by none No
iceberg / sort key timezone / month transform utc none — failure is UNRECOGNIZED_ARGUMENTS from the binary (missing-dep, not Hybrid) No
iceberg / iterator race condition (rest + glue) none No
parquet / postgresql + mysql round-trip none No

No failing test intersects the Hybrid-tables code path or fails at a rate above the all-PR baseline on antalya-26.3.

@alsugiliazova
Copy link
Copy Markdown
Member

Audit Report — PR #1694

AI audit note: This review was generated by AI.

Summary

Forward-port of the Hybrid table engine from #1442 (merge bdef614e1f2 onto antalya-26.3): Distributed-backed base segment plus optional segments with per-segment predicates, HybridCastsPass, __aliasMarker handling for distributed query trees, cluster executeQuery local plans united with UnionStep, tests, and hybrid.md.

Design alignment (hybrid-mt-iceberg-design.md, draft 2026-05-06)

  • §10.6 — Documented anti-patterns include correlated subqueries touching Hybrid (analogous to distributed_product_mode semantics).
  • §10.8 — Correlated subqueries are discouraged and described as not Hybrid-specific; fuller documentation is tagged [P2], not core v1 query semantics.

Skipping nested QUERY / UNION subtrees in ReplaceColumnNodesForTableExpressionVisitor matches treating correlated-on-Hybrid as a discouraged edge rather than a first-class supported surface.

Confirmed defects

None identified in reviewed scope for intended usage, including alignment with the gist’s v1 stance on correlated subqueries.

Recommended follow-ups

Comment vs traversal (ReplaceColumnNodesForTableExpressionVisitor) — In StorageDistributed.cpp, needChildVisit (~947–951) does not descend into QUERY / UNION children, while the comment block (~1134–1148) refers to rewriting the “whole query tree.” Tighten the comment to the scopes actually visited, or extend traversal plus tests if correlated-on-Hybrid becomes an explicitly supported product decision (gist [P2] documentation track for subquery semantics).

Hygiene — Class name typo ReplaseAliasColumnsVisitor in StorageDistributed.cpp (rename to ReplaceAliasColumnsVisitor when practical).

Coverage summary

  • Reviewed: registerStorageHybrid, setHybridLayout / cast cache, buildQueryTreeDistributed, segment additional_query_infos, queryNodeToDistributedSelectQuery, ClusterProxy::executeQuery and UnionStep, HybridCastsPass, __aliasMarker / normalizeAliasMarkersInQueryTree, PlannerActionsVisitor, hybrid.md, stateless tests 0364303648; gist §4.6, §10.6–10.8.
  • Correlated-on-Hybrid: Per gist, discouraged / anti-pattern for v1; not a committed correctness guarantee.
  • Sample checks: DDL predicate validation; empty-cluster behaviour with extra segments; header required for local hybrid plans; __aliasMarker internal category and normalization; documented INSERT to first segment only; cast-cache refresh after segment DDL per docs; JOIN and nested-Hybrid limits per gist roadmap.
  • Limits: Static review only (no full build or test run); segments / cached_columns_to_cast concurrency treated as DDL-time configuration.

Scope (expanded)

Area Primary files
Engine registration & DDL validation StorageDistributed.cpp (registerStorageHybrid), registerStorages.cpp
Read path, query-tree rewrite, UNION of plans StorageDistributed::read, buildQueryTreeDistributed, ClusterProxy::executeQuery
AST / alias marker for remote serialization Planner/Utils.cpp (normalizeAliasMarkersInQueryTree, queryNodeToDistributedSelectQuery), Functions/identity.cpp
Planner integration PlannerActionsVisitor.cpp
Auto-cast pass Analyzer/Passes/HybridCastsPass.cpp, QueryTreePassManager.cpp
Supporting SelectStreamFactory.*, executeQuery.*, TranslateQualifiedNamesVisitor.*, ASTIdentifier.*, Settings.*, enableAllExperimentalSettings.cpp
Tests & docs 03643_hybrid*.sql, 03644_*, 03645_*, 03648_*, docs/.../hybrid.md

~2.3k insertions, 34 files (per merge diff).


Call graph (high level)

CREATE TABLE … ENGINE = Hybrid(remote/cluster TF, pred [, seg TF, pred]…)
  └─ registerStorageHybrid
       └─ TableFunctionFactory::execute → StorageDistributed
       └─ validate_predicate / validate_segment_schema
       └─ setBaseSegmentPredicate / setHybridLayout / setCachedColumnsToCast

SELECT from Hybrid (analyzer)
  └─ StorageDistributed::read
       └─ buildQueryTreeDistributed (replacement table + segment WHERE merge)
            └─ ReplaceColumnNodesForTableExpressionVisitor (when additional_filter)
            └─ ReplaseAliasColumnsVisitor
       └─ queryNodeToDistributedSelectQuery (+ normalizeAliasMarkersInQueryTree)
       └─ per-segment additional SelectQueryInfo (+ buildQueryTreeDistributed)
  └─ ClusterProxy::executeQuery
       └─ per-shard ReadFromRemote (if any shards)
       └─ createLocalPlan per additional_query_info
       └─ UnionStep ("Hybrid") when multiple plans

Analyzer pass (optional)
  └─ HybridCastsPass::run → HybridCastTablesCollector + HybridCastVisitor

Transition matrix (abbreviated)

Entry Stage State / side effect Invariant
DDL registerStorageHybrid segments, base_segment_predicate, cast cache Every physical Hybrid column present on each segment; predicates analyzable
SELECT (analyzer) buildQueryTreeDistributed Cloned query tree + merged WHERE Hybrid-bound columns in visited scopes share replacement segment semantics when additional_filter is used
SELECT executeQuery plans → optional UnionStep Local segment plans must match shared header
Serialization queryNodeToDistributedSelectQuery AST for remotes Nested __aliasMarker normalized

Fault-injection categories (logical)

Category Outcome in review
Empty distributed cluster + only extra segments Pass — intentional continuation to local segment plans; base remote leg absent by design when shard count is zero
Missing header for local hybrid plans Fail-closedLOGICAL_ERROR in executeQuery
Stale segment types vs cast cache Documented limitation — user must reattach/recreate (hybrid.md)
Predicate / DDL validation errors Fail-closedBAD_ARGUMENTS
Correlated / nested QUERY column rewrite Discouraged per gist §10.6–10.8; optional comment/traversal alignment under Recommended follow-ups
Data races on segments during normal query load Deferred — assumed set only at construction; ALTER interaction not exhaustively traced

C++ defect classes (spot check)

Class Notes
Lifetime / UAF ASTs and query trees cloned for per-segment SelectQueryInfo; no dangling pointer identified in reviewed paths
Iterator invalidation Not central to reviewed diff
Data races segments / cached_columns_to_cast not mutex-protected; treated as DDL-time frozen in static review
Exception safety DDL throws before exposing table; partial failure paths in registerStorageHybrid rethrow wrapped Exception
Integer overflow Low risk in shard/segment counting for typical sizes

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude antalya-26.3 forwardport This is a frontport of code that existed in previous Antalya versions hybrid port-antalya PRs to be ported to all new Antalya releases releasy Created/managed by RelEasy verified Approved for release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants