Skip to content

Blog: Turning LIMIT into an I/O Optimization: Inside DataFusion’s Multi-Layer Pruning Stack#155

Open
xudong963 wants to merge 2 commits intoapache:mainfrom
xudong963:site/limit-pruning
Open

Blog: Turning LIMIT into an I/O Optimization: Inside DataFusion’s Multi-Layer Pruning Stack#155
xudong963 wants to merge 2 commits intoapache:mainfrom
xudong963:site/limit-pruning

Conversation

@xudong963
Copy link
Member

No description provided.

@xudong963 xudong963 changed the title Blog: limit pruning Blog: Turning LIMIT into an I/O Optimization: Inside DataFusion's Limit Pruning Mar 10, 2026
@xudong963
Copy link
Member Author

the staging depoly failed, looks like I don't have the privilage:

fatal: unable to access 'https://github.com/apache/datafusion-site/': The requested URL returned error: 403
Error: Process completed with exit code 128.

@xudong963 xudong963 requested a review from alamb March 10, 2026 07:31
@xudong963 xudong963 force-pushed the site/limit-pruning branch from 577fcb2 to 080696d Compare March 10, 2026 08:33
@xudong963 xudong963 changed the title Blog: Turning LIMIT into an I/O Optimization: Inside DataFusion's Limit Pruning Blog: Turning LIMIT into an I/O Optimization: Inside DataFusion’s Multi-Layer Pruning Stack Mar 10, 2026
@alamb
Copy link
Contributor

alamb commented Mar 10, 2026

Thanks @xudong963 -- I will review this one, though maybe not for a few days

@xudong963
Copy link
Member Author

Thanks @xudong963 -- I will review this one, though maybe not for a few days

Thanks!, no hurry

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing @xudong963 -- very nice 🙏

I left some style suggestions, but nothing I think is required.

Again, really nice


Before diving into limit pruning, let's understand the full pruning pipeline. DataFusion scans Parquet data through a series of increasingly fine-grained filters, each one eliminating data so the next stage processes less:

<figure>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image


### Phase 1: High-Level Discovery

- **Partition Pruning**: The `ListingTable` component evaluates filters that depend only on partition columns — things like `year`, `month`, or `region` encoded in directory paths (e.g., `s3://data/year=2024/month=01/`). Irrelevant directories are eliminated before we even open a file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion is to link the ListingTable to its API docs: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Phase 2: Row Group Statistics

For each surviving file, DataFusion reads row group metadata and classifies each row group into one of three states:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to also mention that the example data (like "snow vole") in the images is from the snowflake paper as otherwise it seems somewhat out of context

- **Partially Matching**: Statistics cannot rule out matching rows, but also cannot guarantee them. These groups might be scanned and verified row by row later.
- **Fully Matching**: Statistics prove that *every single row* in the group satisfies the predicate. This state is key to making limit pruning possible.

Additionally, **bloom filters** could eliminate row groups for equality and `IN`-list predicates at this stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could potentially combine this into the introduction sentence instead to make the doc more concise. Something like instead of

For each surviving file, DataFusion reads row group metadata and classifies each row group into one of three states:

Change to

For each surviving file, DataFusion reads row group metadata and potentially BloomFilters and classifies each row group into one of three states:


The final phase goes even deeper:

- **Page Index Pruning**: Parquet pages have their own min/max statistics. DataFusion uses these to skip individual data pages within a surviving row group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to link to the relevant parquet doc: https://parquet.apache.org/docs/file-format/pageindex/


1. Negate the original predicate
2. Simplify the negated expression
3. Evaluate the negation against the row group's statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth pointing out that DataFusion already had steps 2 and 3 so the change is relatively straightforward


## Case Study: Alpine Wildlife Query

Let's walk through a concrete example. Given a wildlife tracking dataset with four row groups:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I again think it would be good to credit the snowflake paper for this example


There are two natural extensions of this work:

**Page-Level Limit Pruning**: Today, "fully matched" detection operates at the row group level. If we extend this to use page index statistics, we could stop decoding pages *within* a row group once the limit is met. This would pay dividends for wide row groups where only a few pages hold matching data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are tickets, I would recommend linking here (as a way to fish for contributions).
If there are not ticekts, maybe we should file them 🎣

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR for it in our fork


**Page-Level Limit Pruning**: Today, "fully matched" detection operates at the row group level. If we extend this to use page index statistics, we could stop decoding pages *within* a row group once the limit is met. This would pay dividends for wide row groups where only a few pages hold matching data.

**Row Filter Hints**: Even when a row group is fully matched, the current row filter still evaluates predicates row by row. If we pass the fully matched groups info into the row filter builder, we can skip predicate evaluation entirely for guaranteed groups — saving CPU cycles on predicate evaluation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here is a good place to add a link to the ticket

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


DataFusion's pruning pipeline trims redundant I/O from the partition level all the way down to individual rows. Limit pruning adds a new step that creates an early exit when fully matched row groups already satisfy the `LIMIT`. The result is fewer row groups scanned, less data decoded, and faster queries.

The key insights are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xudong963
Copy link
Member Author

@alamb Thanks for the review, addressed all comments c54a3fe

@xudong963
Copy link
Member Author

@alamb iirc, the feature is included in the DF53, so we can merge the PR before writing the DF53 blog, and mention the feature by the blog link.

@alamb
Copy link
Contributor

alamb commented Mar 16, 2026

@alamb iirc, the feature is included in the DF53, so we can merge the PR before writing the DF53 blog, and mention the feature by the blog link.

That is a good call - I am hoping to start on a DF 53 blog later this week

I am happy to merge this PR on whatever timescale you want (not sure if you want to solicit any more reviews)

@xudong963
Copy link
Member Author

(not sure if you want to solicit any more reviews)

Yeah, I'd like to see more reviews, so I plan to keep it open until the DF53 blog is near

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants