Perf: buffer to avoid expensive fancy indexing for dense data#239
Perf: buffer to avoid expensive fancy indexing for dense data#239selmanozleyen wants to merge 8 commits into
Conversation
| use_pinned=self._preload_to_gpu, | ||
| ) | ||
| in_memory_data = self._dense_split_buffer[:needed_len] | ||
| self._np_module.take( |
There was a problem hiding this comment.
So I am not 100% sure this a safe operation on the GPU because AFAIK, operations happen asynchronously. Thus you may hit this line while your model is fitting on a batch derived from in_memory_data but you are then overriding in_memory_data. #105 It may make sense to have a pool
There was a problem hiding this comment.
yep, you are right. But how does a pool solve this? How can we know if the model is done with that data? Isn't copying here our only option? in_memory_data[slice(start, end)].copy()
There was a problem hiding this comment.
Yeah, I guess that would be the only way. Althought that is a good point, the normal indexing on the GPU may copy without .copy(). I am not sure. I hadn't considered that - it might be worth checking.
There was a problem hiding this comment.
The pool was just spitballing.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #239 +/- ##
==========================================
+ Coverage 93.48% 93.55% +0.06%
==========================================
Files 15 15
Lines 1397 1412 +15
==========================================
+ Hits 1306 1321 +15
Misses 91 91
🚀 New features to boost your workflow:
|
For the case of dense data (like in genomic data) and when the feature size is big. I noticed that when bs=1, cs=1, and preload_nchunks=120, lot's of time is being spent on in_memory_data[split] because creates a copy of the row instead of just selecting that row as a view. We can have inplace indexing if we have a buffer using np.take(out=buffer). Would also solve #235
I will give real life data once I run this branch