fix(plan): dedup update from source matches#24932
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
I found one blocking correctness issue in the new UPDATE ... FROM dedup plan.
pkg/sql/plan/bind_update.go:1029-1041 wraps each updated column in its own any_value(...), and pkg/sql/colexec/aggexec/any2.go:44-53 implements any_value as “first non-NULL seen for this column”. With duplicate source matches, that means different updated columns can be taken from different source rows whenever NULLs are distributed differently across those rows.
Example: if the same target row matches (new_a = NULL, new_b = 'x') and (new_a = 7, new_b = NULL), the dedup can materialize (new_a = 7, new_b = 'x'), even though no matching source row produced that combination. PostgreSQL-style UPDATE ... FROM is allowed to pick an arbitrary matching row, but it still has to pick one row consistently, not synthesize a cross-row tuple. Please dedup on a whole-source-row choice rather than per-column any_value aggregation.
aunjgr
left a comment
There was a problem hiding this comment.
XuPeng-SH's concern is already addressed in the follow-up commits. The PR now uses row_number() + filter for whole-row dedup, not per-column any_value(). Tests assert any_value is absent. The BVT case whole_row_t/whole_row_s covers the exact cross-row synthesis scenario described in the review.
What type of PR is this?
Which issue(s) this PR fixes:
issue #23137
What this PR does / why we need it:
This PR fixes the PostgreSQL-style
UPDATE ... SET ... FROM ... WHERE ...path when duplicate source rows match the same target row on tables without FK constraints.The new UPDATE path previously fed duplicate joined rows directly into the update pipeline, which could materialize duplicate primary-key rows for one target row. This change adds a planner-side dedup step for
UPDATE ... FROM: group by the target row's original columns and useany_value()for updated columns, matching the existing fallback path behavior.Test coverage includes:
AGG + any_valuededup.Tested with:
git diff --checkgo test ./pkg/sql/plan -run TestUpdatePgStyleFromDedupsDuplicateSourceMatchesOnNewPath -count=1go test ./pkg/sql/plan -count=1mo-tester update_pg_style_from.sql(108/108passed)makemake static-check(0 issues)