Skip to content

[SPARK-52709][SQL] Fix parsing of STRUCT<>#55285

Closed
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-52709-struct-empty-parsing
Closed

[SPARK-52709][SQL] Fix parsing of STRUCT<>#55285
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-52709-struct-empty-parsing

Conversation

@yadavay-amzn
Copy link
Copy Markdown

What changes were proposed in this pull request?

Fix the STRUCT<> parsing bug by adding a parser-level action to decrement complex_type_level_counter when STRUCT NEQ is matched in the dataType rule.

Root cause: When the lexer sees STRUCT<>, it tokenizes it as STRUCT + NEQ (because NEQ is defined before LT). This increments the counter for STRUCT but never decrements it (no GT token), corrupting the counter. Subsequent >> then fails to be recognized as shift-right.

Previous attempt: PR #51480 was merged then reverted because it modified the NEQ lexer rule, which broke ARRAY(col1 <> col2) where <> is the not-equal operator.

This fix: Follows @cloud-fan's suggestion to handle it at the parser level. When the parser matches STRUCT NEQ in the dataType rule, we know NEQ is being used as empty angle brackets (not as a comparison), so we decrement the counter via an inline action.

Why are the changes needed?

SELECT CAST(null AS STRUCT<>), 2 >> 1 fails with a syntax error because the corrupted counter prevents >> from being recognized as shift-right.

Does this PR introduce any user-facing change?

No. This is a correctness fix for SQL parsing.

How was this patch tested?

Added test in SparkSqlParserSuite covering 6 cases:

  1. STRUCT<> followed by >> (the original bug)
  2. Multiple STRUCT<> followed by >>
  3. STRUCT<> followed by >>> (unsigned shift right)
  4. ARRAY(1 <> 2) still works (the regression case from the reverted PR)
  5. Nested complex types MAP<STRING, ARRAY<INT>> still work
  6. Mix of empty struct + nested types + shift right

Test results:

  • SparkSqlParserSuite: 46/46 passed
  • DataTypeParserSuite: 59/59 passed
  • PlanParserSuite: 80/80 passed

Was this patch authored or co-authored using generative AI tooling?

Yes

@sarutak
Copy link
Copy Markdown
Member

sarutak commented Apr 10, 2026

Hi @yadavay-amzn, thank you for your contribution!
GA workflow seems failed. Could you enable it?
https://github.com/apache/spark/pull/55285/checks?check_run_id=70712131544

@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-52709-struct-empty-parsing branch from 495989d to ef02927 Compare April 10, 2026 02:42
@sarutak
Copy link
Copy Markdown
Member

sarutak commented Apr 10, 2026

The GA failure seems not related to this change. Could you rebase to master ?

When the lexer sees STRUCT<>, it tokenizes it as STRUCT + NEQ because
NEQ (<>) is matched before LT (<) in the lexer. This increments
complex_type_level_counter for STRUCT but never decrements it (no GT
token), corrupting the counter for subsequent tokens. As a result,
`SELECT CAST(null AS STRUCT<>), 2 >> 1` fails because >> is not
recognized as shift-right.

The previous fix (apache#51480) modified the NEQ lexer rule but was reverted
because it broke ARRAY(col1 <> col2) where <> is the not-equal operator.

This fix follows cloud-fan's suggestion to handle it at the parser level.
When the parser matches STRUCT followed by NEQ in the dataType rule, it
decrements the counter via an inline action. This is safe because the
parser has confirmed that NEQ is being used as empty angle brackets in a
type context, not as a comparison operator.

Closes SPARK-52709
@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-52709-struct-empty-parsing branch from ef02927 to d8dea9e Compare April 10, 2026 05:25
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct and well-scoped. The parser-level approach is the right call — it avoids the lexer-level ambiguity that caused the previous revert (PR #51480), and the dataType rule guarantees STRUCT NEQ always means empty angle brackets (never a comparison operator).

Tests cover the key scenarios thoroughly, including the regression guard for ARRAY(1 <> 2).

@cloud-fan cloud-fan closed this in 12d8067 Apr 10, 2026
@cloud-fan
Copy link
Copy Markdown
Contributor

@yadavay-amzn please reply in the JIRA ticket, so that I can assign it to you, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants