Skip to content

Fix Lance point lookup detection for pushdown filters#17

Open
liaoruoxue wants to merge 1 commit into
daft-engine:mainfrom
liaoruoxue:fix-lance-point-lookup-pushdowns
Open

Fix Lance point lookup detection for pushdown filters#17
liaoruoxue wants to merge 1 commit into
daft-engine:mainfrom
liaoruoxue:fix-lance-point-lookup-pushdowns

Conversation

@liaoruoxue
Copy link
Copy Markdown

Summary

This is a follow-up to the existing Lance BTREE point lookup optimization after the Lance integration moved into daft-lance. The index-driven scan path exists, but it can miss filters carried in PyPushdowns.filters, so DataFrame queries can still fall back to fragment scans instead of letting Lance use the scalar index.

This PR:

  • uses pushdowns.filters as the effective filter when no filters were absorbed through push_filters
  • applies the same effective filter to count and regular scan task creation
  • enables BTREE point lookup detection from PyPushdowns.filters
  • keeps fragment scans for row identity reads (with_row_id, with_row_address, with_rowaddr) and include_fragment_id so blob / row identity semantics are preserved

Testing

  • uv run ruff format --check daft_lance tests
  • uv run ruff check daft_lance tests
  • uv run pytest tests/io/lancedb/test_lancedb_point_lookup.py tests/io/lancedb/test_lance_blob_v2_read.py::test_take_blobs_single_row tests/io/lancedb/test_lance_blob_v2_read.py::test_take_blobs_non_contiguous_rows -q
  • uv run pytest tests/io/lancedb -q

Note: uv run mypy daft_lance/ tests/ currently fails on upstream origin/main with existing errors; this branch does not increase the mypy error count.

Manual validation on a large indexed Lance dataset showed an equality lookup using one scan task instead of enumerating all fragments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant