Parser: fix exponential parse time on compound keyword chains#2350
Merged
Conversation
3 tasks
iffyio
approved these changes
Jun 6, 2026
iffyio
left a comment
Contributor
There was a problem hiding this comment.
LGTM! Thanks for these fixes @LucaCappelletti94!
moshap-firebolt
added a commit
to firebolt-analytics/datafusion-sqlparser-rs
that referenced
this pull request
Jun 9, 2026
Mirrors the position-keyed failure cache pattern from apache#2344, apache#2350, apache#2352. `parse_table_factor`'s `(` arm speculatively parses a derived table; on failure it rewinds and tries `parse_table_and_joins` (nested-join). Both arms recurse back into `parse_table_factor` consuming the next `(`, so on pathological inputs like `SELECT 1 FROM (((((...` each level re-runs the speculative arm — work doubles at each level. With 30 nested parens this takes >7s; with 50, the libFuzzer per-test timeout fires (>1300s seen in CI). Cache the parser position at which `parse_derived_table_factor` was already attempted and failed. The next time `parse_table_factor` reaches that position (via the nested-join arm's recursive descent), skip the speculative call and go straight to the fallback. The cache only stores positions where a non-`RecursionLimitExceeded` failure occurred, so the recursion-limit guard still propagates. Regression test: `parse_table_factor_paren_chain_no_exponential_blowup` runs the parse on a worker thread and asserts it returns within 5 s; pre-fix it hangs the libFuzzer worker for >20 minutes on a 666-byte ClickHouse seed surfaced by the `sql_parser_dialects` fuzz harness. Bench: `parse_table_factor_paren_chain/chain_{10,20,30}`. Drive-by: add the missing comma in `criterion_group!` between `parse_compound_keyword_chain` and `parse_prefix_keyword_call_chain` (was a parse error preventing the new bench from registering).
moshap-firebolt
added a commit
to firebolt-analytics/datafusion-sqlparser-rs
that referenced
this pull request
Jun 9, 2026
Mirrors the position-keyed failure cache pattern from apache#2344, apache#2350, apache#2352. `parse_table_factor`'s `(` arm speculatively parses a derived table; on failure it rewinds and tries `parse_table_and_joins` (nested-join). Both arms recurse back into `parse_table_factor` consuming the next `(`, so on pathological inputs like `SELECT 1 FROM (((((...` each level re-runs the speculative arm — work doubles at each level. With 30 nested parens this takes >7s; with 50, the libFuzzer per-test timeout fires (>1300s seen in CI). Cache the parser position at which `parse_derived_table_factor` was already attempted and failed. The next time `parse_table_factor` reaches that position (via the nested-join arm's recursive descent), skip the speculative call and go straight to the fallback. The cache only stores positions where a non-`RecursionLimitExceeded` failure occurred, so the recursion-limit guard still propagates. Regression test: `parse_table_factor_paren_chain_no_exponential_blowup` runs the parse on a worker thread and asserts it returns within 5 s; pre-fix it hangs the libFuzzer worker for >20 minutes on a 666-byte ClickHouse seed surfaced by the `sql_parser_dialects` fuzz harness. Bench: `parse_table_factor_paren_chain/chain_{10,20,30}`. Drive-by: add the missing comma in `criterion_group!` between `parse_compound_keyword_chain` and `parse_prefix_keyword_call_chain` (was a parse error preventing the new bench from registering).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Same family as #2344 and #2349. A reserved keyword in field position (e.g.
NOTin.not-b.not-b...) drivesparse_prefix->parse_not->parse_subexpr, re-walking the chain at every segment and doubling the work. The result is always rejected (UnaryOp), so the speculation is wasted, andparse_identifierin the existing None branch produces the sameIdentifierdirectly. The solution I identified is to skip the speculativeparse_prefixinparse_compound_exprwhen the next token is aWordnot followed by(. Surfaced by subql fuzzing on a 527 B input that took 2.65 s and now takes 270 us.