Skip to content

Conversation

@adsharma
Copy link
Contributor

@adsharma adsharma commented Feb 7, 2026

Fixes: #170

Problem:

When UNWIND produces duplicate values in a single batch, MERGE creates multiple nodes instead of reusing the same node. For example, incorrectly creates two separate nodes because the hash join probe doesn't see nodes created earlier in the same batch.

Solution:

  1. UNWIND_DEDUP Operator

    • Created LogicalUnwindDeduplicate and PhysicalUnwindDedup operators
    • Deduplicates UNWIND output by hash before MERGE, ensuring each unique value appears once
    • Fixes the core deduplication issue for MERGE patterns
  2. Optimizer Integration

    • Created UnwindDedupOptimizer that inserts UNWIND_DEDUP before MERGE's hash join probe side
    • Triggers when MERGE probe child is a HASH_JOIN whose probe child is UNWIND
  3. Constant Folding for Deterministic Inputs

    • Added constant folding in Planner::appendUnwind() for deterministic UNWIND expressions
    • Uses ConstantExpressionVisitor::isConstant() to detect foldable expressions
    • Non-deterministic functions (gen_random_uuid, rand, nextval) are NOT folded
  4. Test Coverage (issue_170)

    • → 1 row (1)
    • → 1 node
    • → 2 (non-deterministic, not folded)

Problem:

When UNWIND produces duplicate values in a single batch, MERGE
creates multiple nodes instead of reusing the same node. For example,
incorrectly creates two separate nodes because the hash join probe
doesn't see nodes created earlier in the same batch.

Solution:

1. UNWIND_DEDUP Operator
   - Created LogicalUnwindDeduplicate and PhysicalUnwindDedup operators
   - Deduplicates UNWIND output by hash before MERGE, ensuring each unique value appears once
   - Fixes the core deduplication issue for MERGE patterns

2. Optimizer Integration
   - Created UnwindDedupOptimizer that inserts UNWIND_DEDUP before MERGE's hash join probe side
   - Triggers when MERGE probe child is a HASH_JOIN whose probe child is UNWIND

3. Constant Folding for Deterministic Inputs
   - Added constant folding in Planner::appendUnwind() for deterministic UNWIND expressions
   - Uses ConstantExpressionVisitor::isConstant() to detect foldable expressions
   - Non-deterministic functions (gen_random_uuid, rand, nextval) are NOT folded

4. Test Coverage (issue_170)
   -  → 1 row (1)
   -  → 1 node
   -  → 2 (non-deterministic, not folded)
…perators

The UNWIND_DEDUP optimizer was only detecting UNWIND when it was the direct
child of HASH_JOIN in the query plan. However, when using gen_random_uuid()
or other default expressions, the planner may insert intermediate operators
like FLATTEN between HASH_JOIN and UNWIND, causing the pattern match to fail.

This fix adds recursive tree traversal to find UNWIND nodes even when they
are nested deeper in the probe child subtree. When found, the optimizer now
correctly wraps the entire probe child with UNWIND_DEDUP to deduplicate values
before MERGE processing.

Fixes issue where MERGE with duplicate UNWIND values (e.g., UNWIND [1, 1])
would create multiple nodes with different gen_random_uuid() values instead
of recognizing them as duplicates and creating only one node.
This needs deeper thought. We need a clearer language spec on when
MERGE is expected to return multiple rows vs optimize it away.

Previous test failures:

e2e_test_transaction~create_node~merge_tinysnb_checkpoint.MergeNodeWithOnMatchAndOnCreate (Failed)
e2e_test_transaction~create_node~merge_tinysnb_checkpoint.MergeRelWithOnMatch (Failed)
e2e_test_dml_node~merge~merge_tinysnb.MergeNodeWithOnMatchAndOnCreate (Failed)
e2e_test_dml_node~merge~merge_tinysnb.MergeRelWithOnMatch (Failed)
e2e_test_dml_node~merge~merge_tinysnb.MergeNodeWithOnMatch (Failed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Merge node with gen_random_uuid() primary key returns wrong primary key on match

1 participant