Antalya 26.3: Cluster Joins part 2 - global mode#1782
Conversation
Cluster Joins part 2 - global mode
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 01d8b03ac1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (info.has_join || info.has_cross_join || info.has_local_columns_in_where) | ||
| { | ||
| auto modified_query_tree = query_info.query_tree->clone(); | ||
|
|
||
| rewriteJoinToGlobalJoin(modified_query_tree, context); | ||
| modified_query_tree = buildQueryTreeForShard( |
There was a problem hiding this comment.
Rewrite non-GLOBAL IN predicates in global join mode
When object_storage_cluster_join_mode='global', this branch also runs for has_local_columns_in_where, but the rewrite path only calls rewriteJoinToGlobalJoin (JOIN-only) before buildQueryTreeForShard. For queries like ... WHERE value IN (SELECT id FROM local_table) (no JOIN/CROSS JOIN), the IN remains non-GLOBAL, so the subquery still executes on shards and fails if local_table exists only on the initiator. This is a correctness regression in the newly added global mode path because it now claims to handle local-column WHERE cases but does not transform plain IN subqueries.
Useful? React with 👍 / 👎.
| temporary_table_expression_node->setAlias(join_table_expression->getAlias()); | ||
|
|
||
| replacement_map.emplace(join_table_expression.get(), std::move(temporary_table_expression_node)); |
There was a problem hiding this comment.
Preserve descendant source mappings for CROSS JOIN replacements
In the new CrossJoinNode branch, only the top-level table expression is inserted into replacement_map. Unlike the JOIN path just above, descendants are not mapped, so compound right-side expressions (for example an ARRAY_JOIN table expression) can leave ColumnNode source weak pointers referencing nodes from the old subtree after cloneAndReplace, which can later surface as logical exceptions during analysis/execution. The replacement needs the same descendant remapping done for JOIN replacements.
Useful? React with 👍 / 👎.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Cluster Joins part 2 - global mode
Documentation entry for user-facing changes
Frontport of #1527
Setting
object_storage_cluster_join_modewiith valueglobal.In queries like
when left table is executed on cluster (
s3Cluster,Icebergwithobject_storage_clustersetting, etc.) data from right table is extracted and sent to swarm nodes as temorary tables. JOIN is executed on swarm nodes.This PR also includes several fixes for issues, found by AI
buildQueryTreeForShardwith ARRAY JOIN and GLOBAL JOIN ClickHouse/ClickHouse#96310These changes are in last three commits, and new for 26.3 port, do not exists in #1527 for 26.1.
CI/CD Options
Exclude tests:
Regression jobs to run: