Perf: buffer to avoid expensive fancy indexing for dense data by selmanozleyen · Pull Request #239 · scverse/annbatch

selmanozleyen · 2026-06-16T12:30:49Z

For the case of dense data (like in genomic data) and when the feature size is big. I noticed that when bs=1, cs=1, and preload_nchunks=120, lot's of time is being spent on in_memory_data[split] because creates a copy of the row instead of just selecting that row as a view. We can have inplace indexing if we have a buffer using np.take(out=buffer). Would also solve #235

I will give real life data once I run this branch

for more information, see https://pre-commit.ci

ilan-gold · 2026-06-16T12:36:51Z

+                        use_pinned=self._preload_to_gpu,
+                    )
+                in_memory_data = self._dense_split_buffer[:needed_len]
+                self._np_module.take(


So I am not 100% sure this a safe operation on the GPU because AFAIK, operations happen asynchronously. Thus you may hit this line while your model is fitting on a batch derived from in_memory_data but you are then overriding in_memory_data. #105 It may make sense to have a pool

yep, you are right. But how does a pool solve this? How can we know if the model is done with that data? Isn't copying here our only option? in_memory_data[slice(start, end)].copy()

Yeah, I guess that would be the only way. Althought that is a good point, the normal indexing on the GPU may copy without .copy(). I am not sure. I hadn't considered that - it might be worth checking.

The pool was just spitballing.

codecov · 2026-06-16T12:54:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.55%. Comparing base (eefa63c) to head (93a628b).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #239      +/-   ##
==========================================
+ Coverage   93.48%   93.55%   +0.06%     
==========================================
  Files          15       15              
  Lines        1397     1412      +15     
==========================================
+ Hits         1306     1321      +15     
  Misses         91       91

Files with missing lines	Coverage Δ
src/annbatch/loader.py	`91.11% <100.00%> (+0.31%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

selmanozleyen and others added 4 commits June 16, 2026 14:15

init

6f5d468

special case for sparse

c396538

Merge branch 'main' into perf/buffer-to-avoid-expensive-fancy-indexing

7d147d5

[pre-commit.ci] auto fixes from pre-commit.com hooks

d4bf554

for more information, see https://pre-commit.ci

selmanozleyen added the run-gpu-ci Signal that gpu ci should be run label Jun 16, 2026

ilan-gold reviewed Jun 16, 2026

View reviewed changes

selmanozleyen added 3 commits June 16, 2026 14:38

inv fix

c254666

use np_mod empty

e925f90

copy the row

cdc656a

new buffer for mixed typed cases

93a628b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: buffer to avoid expensive fancy indexing for dense data#239

Perf: buffer to avoid expensive fancy indexing for dense data#239
selmanozleyen wants to merge 8 commits into
scverse:mainfrom
selmanozleyen:perf/buffer-to-avoid-expensive-fancy-indexing

selmanozleyen commented Jun 16, 2026 •

edited

Loading

Uh oh!

ilan-gold Jun 16, 2026 •

edited

Loading

Uh oh!

selmanozleyen Jun 16, 2026 •

edited

Loading

Uh oh!

ilan-gold Jun 16, 2026 •

edited

Loading

Uh oh!

ilan-gold Jun 16, 2026

Uh oh!

codecov Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

selmanozleyen commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

selmanozleyen Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

selmanozleyen commented Jun 16, 2026 •

edited

Loading

ilan-gold Jun 16, 2026 •

edited

Loading

selmanozleyen Jun 16, 2026 •

edited

Loading

ilan-gold Jun 16, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading