support DynamicFilterPhysicalExpr#4961
Conversation
| if is_dynamic_physical_expr(&e) { | ||
| e.snapshot().ok().flatten() | ||
| } else { | ||
| Some(Arc::clone(e)) | ||
| } |
There was a problem hiding this comment.
Not sure if it's possible or worth it but you could do this more lazily, or wrap the DataFusion dynamic filter in a Vortex dynamic filter and call DynamicFilterPhysicalExpr::current() or PhysicalExpr::snapshot() on each RecordBatch (that would be more overhead but also will kick in sooner on scans that touch very large files).
There was a problem hiding this comment.
I roughly tried to do that in some old PR, but I don't remember why I stopped pushing it. This seems like a good starting point, and we can experiment with it in the future to see which one is better.
| .ok_or_else(|| { | ||
| vortex_err!("Plan should have 2 DataSourceExec, the second is the probe side") | ||
| })?; | ||
| assert!(data_source_line.contains("DynamicFilterPhysicalExpr [ a@0 >= 1 AND a@0 <= 3 ]")); |
There was a problem hiding this comment.
I'm not sure what other assertions we could make. I don't think Vortex gives any metrics on rows pruned, etc. And the Vortex expression isn't saved anywhere / is not in the plan.
There was a problem hiding this comment.
I put in a print statement and can see that an appropriate Vortex predicate is being created, but can't verify that it's doing what's expected
There was a problem hiding this comment.
I'm not sure I have a better idea here right now, I need to figure out how to improve the overall testability
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
We cannot currently run pre commit benchmark on fork so I created this (#4986) |
|
@adriangb haven't forgotten this PR, just really busy week |
|
I can belive some of the perf impact, but I suspect AWS tonight is just extremely noisy, gotta love the 40% on clickbecnh q0. |
Is there a benchmark or something I'm not seeing? If it's Polar Signal I don't seem to have access. |
|
They ran on this PR, seems like there's an issue to trigger benchmarks on this branch (@joseph-isaacs got something I can look at? Would love to be able to do that) |
|
@AdamGS what's your read on the state of this PR / what can I do to help? It's not clear to me if there are issues with benchmarks, if @joseph-isaacs is planning on taking it over in another PR, if I should rewrite the commits with the licensing info, etc. |
|
You beat me to comment by a minute! |
|
Something is going on with the label check, I'll figure that one out. |
|
Last benchmarks run - https://github.com/vortex-data/vortex/actions/runs/18684856431 |
joseph-isaacs
left a comment
There was a problem hiding this comment.
There are some big regressions on tpch which we need to address before merging
|
Completely missed that, I'll take a deeper look. |
|
This PR has been marked as stale because it has been open for 30 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days |
|
I think you may have found some real performance issues. We are discussing similar findings on the DataFusion side: TLDR is we think some of the expressions may be expensive to evaluate. Unclear if that means they're not worth having at all, or if the plan shape / parallelism needs to be tweaked, but it does seem real. |
cc @AdamGS