[FLINK-39558][table] LogicalUnnestRule: use Calcite Uncollect rowType instead of LogicalType round-trip#28053
Open
jnh5y wants to merge 4 commits intoapache:masterfrom
Open
[FLINK-39558][table] LogicalUnnestRule: use Calcite Uncollect rowType instead of LogicalType round-trip#28053jnh5y wants to merge 4 commits intoapache:masterfrom
jnh5y wants to merge 4 commits intoapache:masterfrom
Conversation
Adds two tests against the existing nested_not_null.business_data column,
covering LEFT-JOIN UNNEST shapes that hit RelOptUtil.verifyTypeEquivalence
("Cannot add expression of different type to set"):
- testNullMismatchLeftJoinNoAliasList: bare Uncollect under LEFT (no
column-list alias inserts no Project).
- testNullMismatchLeftJoinOnPredicate: ON-clause predicate adds a
LogicalFilter between the LEFT correlate and the Uncollect.
Generated-by: claude-opus-4-7
Collaborator
snuyanzin
reviewed
Apr 28, 2026
Comment on lines
+286
to
+297
| def testNullMismatchLeftJoinNoAliasList(): Unit = { | ||
| // Bare Uncollect under the LEFT correlate (no column-list alias inserts no Project). | ||
| util.verifyRelPlan( | ||
| "SELECT * FROM nested_not_null LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd ON TRUE") | ||
| } | ||
|
|
||
| @Test | ||
| def testNullMismatchLeftJoinOnPredicate(): Unit = { | ||
| // ON-clause predicate adds a LogicalFilter between the LEFT correlate and the Uncollect. | ||
| util.verifyRelPlan( | ||
| "SELECT * FROM nested_not_null LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd ON exploded_bd <> 'debug'") | ||
| } |
Contributor
There was a problem hiding this comment.
why do we need comments here?
is sql not self explainable?
If you do something which is different from others (other tests in this file) you need justification
Contributor
Author
There was a problem hiding this comment.
Removed comments in this file and also in the rule.
Also, added some ITCases.
Use uncollect.getRowType() directly instead of round-tripping through Flink's LogicalType. The rewritten Correlate's derived rowType then matches the original byte-for-byte, fixing the verifyTypeEquivalence crashes for LEFT JOIN UNNEST shapes that the FLINK-33217 patch did not cover (bare Uncollect, Filter(Uncollect)). The getLogicalProjectWithAdjustedNullability helper introduced by FLINK-33217 is removed along with its dependent imports. Plan fixtures are updated to reflect Calcite's source-derived field naming (e.g. tags0 instead of f0/EXPR$0; KEY/VALUE for MAPs). UnnestITCase coverage is added for the two LEFT-JOIN UNNEST shapes (batch + stream), including empty-array and predicate-filters-everything cases. Generated-by: claude-opus-4-7
05b336d to
772f2e7
Compare
Generated-by: claude-opus-4-7
Generated-by: claude-opus-4-7
snuyanzin
approved these changes
Apr 30, 2026
snuyanzin
reviewed
Apr 30, 2026
Comment on lines
689
to
693
| <Resource name="optimized exec plan"> | ||
| <![CDATA[ | ||
| Calc(select=[id, f0 AS k, f1 AS v, ORDINALITY AS pos]) | ||
| +- Correlate(invocation=[$UNNEST_ROWS_WITH_ORDINALITY$1($cor0.map_data)], correlate=[table($UNNEST_ROWS_WITH_ORDINALITY$1($cor0.map_data))], select=[id,map_data,f0,f1,ORDINALITY], rowType=[RecordType(INTEGER id, (VARCHAR(2147483647), VARCHAR(2147483647)) MAP map_data, VARCHAR(2147483647) f0, VARCHAR(2147483647) f1, INTEGER ORDINALITY)], joinType=[INNER]) | ||
| Calc(select=[id, KEY AS k, VALUE AS v, ORDINALITY AS pos]) | ||
| +- Correlate(invocation=[$UNNEST_ROWS_WITH_ORDINALITY$1($cor0.map_data)], correlate=[table($UNNEST_ROWS_WITH_ORDINALITY$1($cor0.map_data))], select=[id,map_data,KEY,VALUE,ORDINALITY], rowType=[RecordType(INTEGER id, (VARCHAR(2147483647), VARCHAR(2147483647)) MAP map_data, VARCHAR(2147483647) KEY, VARCHAR(2147483647) VALUE, INTEGER ORDINALITY)], joinType=[INNER]) | ||
| +- TableSourceScan(table=[[default_catalog, default_database, MyTable]], fields=[id, map_data]) |
Contributor
There was a problem hiding this comment.
this is a change in exec plan
do we have a store restore test?
if no, then better to have with plan generated before the change and check that after that the query still able to process
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Fixes an internal-error class in
LogicalUnnestRulewhere theTableFunctionScanrowType diverges from what Calcite derives for the originalCorrelate(Uncollect)tree, causingRelOptUtil.verifyTypeEquivalenceto fail forLEFT JOIN UNNESTshapes that don't fit the FLINK-33217 patch path.Field naming change
Calcite's
Uncollectderives field names from the source array, so plan output for UNNEST columns changes:ARRAY<T>/MULTISET<T>: the unnested column is named after the source array column (e.g.,tags0instead of syntheticf0/EXPR$0).MAP<K,V>: key/value columns are namedKEYandVALUEinstead off0/f1.WITH ORDINALITY: ordinality column is namedORDINALITY(unchanged).Multiple unnests in the same query are auto-disambiguated by Calcite's outer Correlate (e.g. two MAP unnests produce
KEY, VALUE, KEY0, VALUE0).The runtime
INTERNAL_UNNEST_ROWSfunction is positional, so persistedCompiledPlaninstances continue to restore correctly (verified byCorrelateRestoreTest).Does this pull request potentially affect one of the following parts?
@Public(Evolving): noDocumentation