Blog: Turning LIMIT into an I/O Optimization: Inside DataFusion’s Multi-Layer Pruning Stack#155
Blog: Turning LIMIT into an I/O Optimization: Inside DataFusion’s Multi-Layer Pruning Stack#155xudong963 wants to merge 2 commits intoapache:mainfrom
Conversation
|
the staging depoly failed, looks like I don't have the privilage: |
577fcb2 to
080696d
Compare
|
Thanks @xudong963 -- I will review this one, though maybe not for a few days |
Thanks!, no hurry |
alamb
left a comment
There was a problem hiding this comment.
This is amazing @xudong963 -- very nice 🙏
I left some style suggestions, but nothing I think is required.
Again, really nice
|
|
||
| Before diving into limit pruning, let's understand the full pruning pipeline. DataFusion scans Parquet data through a series of increasingly fine-grained filters, each one eliminating data so the next stage processes less: | ||
|
|
||
| <figure> |
|
|
||
| ### Phase 1: High-Level Discovery | ||
|
|
||
| - **Partition Pruning**: The `ListingTable` component evaluates filters that depend only on partition columns — things like `year`, `month`, or `region` encoded in directory paths (e.g., `s3://data/year=2024/month=01/`). Irrelevant directories are eliminated before we even open a file. |
There was a problem hiding this comment.
Minor suggestion is to link the ListingTable to its API docs: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html
There was a problem hiding this comment.
Similarly for FilePruner: https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.FilePruner.html
|
|
||
| ### Phase 2: Row Group Statistics | ||
|
|
||
| For each surviving file, DataFusion reads row group metadata and classifies each row group into one of three states: |
There was a problem hiding this comment.
It might help to also mention that the example data (like "snow vole") in the images is from the snowflake paper as otherwise it seems somewhat out of context
| - **Partially Matching**: Statistics cannot rule out matching rows, but also cannot guarantee them. These groups might be scanned and verified row by row later. | ||
| - **Fully Matching**: Statistics prove that *every single row* in the group satisfies the predicate. This state is key to making limit pruning possible. | ||
|
|
||
| Additionally, **bloom filters** could eliminate row groups for equality and `IN`-list predicates at this stage. |
There was a problem hiding this comment.
You could potentially combine this into the introduction sentence instead to make the doc more concise. Something like instead of
For each surviving file, DataFusion reads row group metadata and classifies each row group into one of three states:
Change to
For each surviving file, DataFusion reads row group metadata and potentially BloomFilters and classifies each row group into one of three states:
|
|
||
| The final phase goes even deeper: | ||
|
|
||
| - **Page Index Pruning**: Parquet pages have their own min/max statistics. DataFusion uses these to skip individual data pages within a surviving row group. |
There was a problem hiding this comment.
It might help to link to the relevant parquet doc: https://parquet.apache.org/docs/file-format/pageindex/
|
|
||
| 1. Negate the original predicate | ||
| 2. Simplify the negated expression | ||
| 3. Evaluate the negation against the row group's statistics |
There was a problem hiding this comment.
maybe worth pointing out that DataFusion already had steps 2 and 3 so the change is relatively straightforward
|
|
||
| ## Case Study: Alpine Wildlife Query | ||
|
|
||
| Let's walk through a concrete example. Given a wildlife tracking dataset with four row groups: |
There was a problem hiding this comment.
I again think it would be good to credit the snowflake paper for this example
|
|
||
| There are two natural extensions of this work: | ||
|
|
||
| **Page-Level Limit Pruning**: Today, "fully matched" detection operates at the row group level. If we extend this to use page index statistics, we could stop decoding pages *within* a row group once the limit is met. This would pay dividends for wide row groups where only a few pages hold matching data. |
There was a problem hiding this comment.
If there are tickets, I would recommend linking here (as a way to fish for contributions).
If there are not ticekts, maybe we should file them 🎣
There was a problem hiding this comment.
I have a PR for it in our fork
|
|
||
| **Page-Level Limit Pruning**: Today, "fully matched" detection operates at the row group level. If we extend this to use page index statistics, we could stop decoding pages *within* a row group once the limit is met. This would pay dividends for wide row groups where only a few pages hold matching data. | ||
|
|
||
| **Row Filter Hints**: Even when a row group is fully matched, the current row filter still evaluates predicates row by row. If we pass the fully matched groups info into the row filter builder, we can skip predicate evaluation entirely for guaranteed groups — saving CPU cycles on predicate evaluation. |
There was a problem hiding this comment.
Likewise here is a good place to add a link to the ticket
|
|
||
| DataFusion's pruning pipeline trims redundant I/O from the partition level all the way down to individual rows. Limit pruning adds a new step that creates an early exit when fully matched row groups already satisfy the `LIMIT`. The result is fewer row groups scanned, less data decoded, and faster queries. | ||
|
|
||
| The key insights are: |
There was a problem hiding this comment.
I also recommend ending with a call to action / come help us type blurb -- something like https://datafusion.apache.org/blog/2026/02/02/datafusion_case/#about-datafusion and https://datafusion.apache.org/blog/2026/02/02/datafusion_case/#how-to-get-involved
|
@alamb iirc, the feature is included in the DF53, so we can merge the PR before writing the DF53 blog, and mention the feature by the blog link. |
That is a good call - I am hoping to start on a DF 53 blog later this week I am happy to merge this PR on whatever timescale you want (not sure if you want to solicit any more reviews) |
Yeah, I'd like to see more reviews, so I plan to keep it open until the DF53 blog is near |

No description provided.