Skip to content

fix: normalize list offsets in merge_with_schema for sliced arrays#6594

Draft
LuciferYang wants to merge 1 commit intolance-format:mainfrom
LuciferYang:fix-6580-list-merge-offsets
Draft

fix: normalize list offsets in merge_with_schema for sliced arrays#6594
LuciferYang wants to merge 1 commit intolance-format:mainfrom
LuciferYang:fix-6580-list-merge-offsets

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented Apr 22, 2026

Summary

Fixes an "offset past values" panic in merge_with_schema when the left input is a sliced ListArray (or LargeListArray) whose first offset is non-zero.

Root cause

In rust/lance-arrow/src/lib.rs, the list-merge branches pair trimmed_values() (which returns values starting at index 0) with offsets().clone() (which retains the original absolute offsets from before slicing). When the sliced array's first offset is > 0, the re-assembled ListArray has offsets that point past the end of the trimmed values buffer, and Arrow's invariant check panics.

Sliced list inputs are common in practice — filtered scans and take both produce them, e.g. the trailing batch of a merge_insert against a filtered dataset.

Fix

Added normalize_offsets(), which subtracts the first offset from each offset so the buffer is zero-based and consistent with the trimmed values. Zero-first-offset inputs short-circuit and clone without allocation. Applied in both the List and LargeList merge branches.

Test plan

  • Regression test test_merge_with_schema_sliced_list_with_nonzero_offset in lance-arrow — builds a list<struct{id}>, slices rows 1..3 so offsets start at 2, merges against a schema that adds a new struct field, and verifies the merged batch opens without panic and carries the correct values.
  • cargo test -p lance-arrow
  • cargo clippy --all --tests -- -D warnings
  • cargo fmt --all -- --check

…ays (lance-format#6580)

When a ListArray is sliced (e.g. from a filtered scan's trailing batch),
its offsets don't start at 0. `trimmed_values()` returns values starting
from index 0, but `offsets().clone()` retained the original absolute
offsets, causing an "offset past values" panic.

Normalize offsets by subtracting offsets[0] so they are consistent with
the zero-based trimmed values buffer. Applies to both List and LargeList
branches.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label Apr 22, 2026
@LuciferYang LuciferYang marked this pull request as draft April 22, 2026 09:26
@LuciferYang
Copy link
Copy Markdown
Contributor Author

Let me fix the CI issues first.

@LuciferYang LuciferYang force-pushed the fix-6580-list-merge-offsets branch from a2bbcc2 to 27f63bd Compare April 22, 2026 12:47
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant