Skip to content

Parser: fix exponential parse time on compound chains#2344

Merged
iffyio merged 6 commits into
apache:mainfrom
LucaCappelletti94:pathological
May 21, 2026
Merged

Parser: fix exponential parse time on compound chains#2344
iffyio merged 6 commits into
apache:mainfrom
LucaCappelletti94:pathological

Conversation

@LucaCappelletti94

Copy link
Copy Markdown
Contributor

parse_compound_expr recursed into parse_subexpr after every ., which re-walks the rest of the chain inside a rollback boundary. On inputs like IF a.b.c...x.# the work doubled per chain element, giving 2^N parse times. Switching the inner call to parse_prefix removes the redundant traversal: the outer loop already walks the chain, so the resulting AST is identical on valid SQL.

Measured on PostgreSqlDialect with three inputs from a libFuzzer corpus, release build:

Input Before After
iF i.D.i.:Fi.... (65 B) 419 ms 87 us
iF i.D.i.i. ... .*~ (58 B) 893 ms 69 us
if-stf-localtclocal33alt.... (306 B) >35 s 98 us

Regression test in tests/sqlparser_postgres.rs runs a 30-element chain with a 5 s timeout. Pre-fix it hits the timeout, post-fix it finishes in well under a millisecond.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@LucaCappelletti94

Copy link
Copy Markdown
Contributor Author

Copilot does not seem to beat the AI bubble allegations lol

Comment thread src/parser/mod.rs Outdated
Comment thread tests/sqlparser_postgres.rs Outdated

@iffyio iffyio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @LucaCappelletti94!

@iffyio iffyio added this pull request to the merge queue May 21, 2026
Merged via the queue into apache:main with commit 2705a7d May 21, 2026
10 checks passed
moshap-firebolt added a commit to firebolt-analytics/datafusion-sqlparser-rs that referenced this pull request Jun 9, 2026
Mirrors the position-keyed failure cache pattern from apache#2344, apache#2350, apache#2352.

`parse_table_factor`'s `(` arm speculatively parses a derived table; on
failure it rewinds and tries `parse_table_and_joins` (nested-join). Both
arms recurse back into `parse_table_factor` consuming the next `(`, so
on pathological inputs like `SELECT 1 FROM (((((...` each level
re-runs the speculative arm — work doubles at each level. With 30 nested
parens this takes >7s; with 50, the libFuzzer per-test timeout fires
(>1300s seen in CI).

Cache the parser position at which `parse_derived_table_factor` was
already attempted and failed. The next time `parse_table_factor` reaches
that position (via the nested-join arm's recursive descent), skip the
speculative call and go straight to the fallback. The cache only stores
positions where a non-`RecursionLimitExceeded` failure occurred, so the
recursion-limit guard still propagates.

Regression test: `parse_table_factor_paren_chain_no_exponential_blowup`
runs the parse on a worker thread and asserts it returns within 5 s;
pre-fix it hangs the libFuzzer worker for >20 minutes on a 666-byte
ClickHouse seed surfaced by the `sql_parser_dialects` fuzz harness.

Bench: `parse_table_factor_paren_chain/chain_{10,20,30}`.

Drive-by: add the missing comma in `criterion_group!` between
`parse_compound_keyword_chain` and `parse_prefix_keyword_call_chain`
(was a parse error preventing the new bench from registering).
moshap-firebolt added a commit to firebolt-analytics/datafusion-sqlparser-rs that referenced this pull request Jun 9, 2026
Mirrors the position-keyed failure cache pattern from apache#2344, apache#2350, apache#2352.

`parse_table_factor`'s `(` arm speculatively parses a derived table; on
failure it rewinds and tries `parse_table_and_joins` (nested-join). Both
arms recurse back into `parse_table_factor` consuming the next `(`, so
on pathological inputs like `SELECT 1 FROM (((((...` each level
re-runs the speculative arm — work doubles at each level. With 30 nested
parens this takes >7s; with 50, the libFuzzer per-test timeout fires
(>1300s seen in CI).

Cache the parser position at which `parse_derived_table_factor` was
already attempted and failed. The next time `parse_table_factor` reaches
that position (via the nested-join arm's recursive descent), skip the
speculative call and go straight to the fallback. The cache only stores
positions where a non-`RecursionLimitExceeded` failure occurred, so the
recursion-limit guard still propagates.

Regression test: `parse_table_factor_paren_chain_no_exponential_blowup`
runs the parse on a worker thread and asserts it returns within 5 s;
pre-fix it hangs the libFuzzer worker for >20 minutes on a 666-byte
ClickHouse seed surfaced by the `sql_parser_dialects` fuzz harness.

Bench: `parse_table_factor_paren_chain/chain_{10,20,30}`.

Drive-by: add the missing comma in `criterion_group!` between
`parse_compound_keyword_chain` and `parse_prefix_keyword_call_chain`
(was a parse error preventing the new bench from registering).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants