feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112
Open
FIrgolitsch wants to merge 5 commits into
Open
feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112FIrgolitsch wants to merge 5 commits into
FIrgolitsch wants to merge 5 commits into
Conversation
This was referenced Apr 30, 2026
6cb511c to
288f45f
Compare
7544bf5 to
95f24bc
Compare
288f45f to
af57ab6
Compare
95f24bc to
14bfd65
Compare
af57ab6 to
47e639e
Compare
ab73c15 to
a6f274b
Compare
This was referenced May 1, 2026
Squashed pr-m-gpu-kvikio: - linumpy.gpu.zarr_io: high-level read_zarr_to_gpu dispatcher (kvikio/GDS native -> zarr.config.enable_gpu fallback) - linumpy.gpu.kvikio_zarr: kvikio-backed zarr->GPU reader using contiguous CuFile.pread scratch buffer - GPU keep-on-device for resize/affine/map_coordinates; resample writer uses gpu_zarr_context - linum_aip / linum_aip_png GPU pipelines stay on device end-to-end - Device-aware find_tissue_interface, find_z_overlap; vectorized focal curvature roll - Wire --use_gpu for focal_curvature + stack into nextflow - Increase GPU resample maxForks; tune GPU memory management in _run_pipelined - deps: bump ome-zarr to >=0.16.0 (NGFF 0.5) - Server: nvidia-fs 2.28.4 patch script for kernel 7.0 - Benchmark script linum_benchmark_kvikio_zarr.py - Bug fix: detect_focal_curvature broadcasts per-tile correction across tile positions
…n kernel 7.0
- create_nv.symvers.sh: case-insensitive grep for relative CRCs ('r' vs 'R'),
needed because kernel 7.0 / open-gpu-kmd emits __crc_nvidia_p2p_* as
section-local rodata. Without this, modversion fallback was skipped and
bogus relocation offsets were written, producing an unloadable nvidia-fs.
- create_nv.symvers.sh: 'zstd -df --rm' (force) so reruns don't stall on the
'overwrite (y/n)?' prompt when nvidia.ko already exists.
- nvfs-mmap.c patch: idempotent (skip if already applied).
…r-python PR #2863)
47e639e to
1862806
Compare
a6f274b to
f6c23e5
Compare
This was referenced May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR — GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring
Extends the GPU stack with end-to-end on-device data flow for the OCT reconstruction pipeline and adds a GPUDirect Storage (GDS) reader as a fast path for reading uncompressed zarr arrays straight into device memory.
GPU keep-on-device
linumpy.gpu.zarr_iowithgpu_zarr_context()(useszarr.config.enable_gpu()) andread_zarr_to_gpu(...)with auto backend selection (kvikio when available, zarr-gpu otherwise).linumpy.gpu.interpolation: device-preservingresize,affine_transform,map_coordinates.linumpy.gpu.interfacewith a GPU implementation offind_tissue_interface(no-mask path) usingcupyxfilters.linumpy.geometry.interface.find_tissue_interface(..., use_gpu=...)andlinumpy.mosaic.stacking.find_z_overlap(..., use_gpu=...)now route to GPU when requested.linum_aip.pyandlinum_resample_mosaic_grid.pyusegpu_zarr_contextto keep tiles on-device through the slab loop and writer.linum_detect_focal_curvature.py: vectorized roll viatake_along_axis(xpdispatch) and--use_gpu/--no-use_gpu.linum_stack_slices_motor.py:--use_gpu/--no-use_gpuplumbed tofind_z_overlap.kvikio (GDS) reader (prototype)
linumpy/gpu/kvikio_zarr.py: GDS reader for raw uncompressed zarr v2 + v3.Corder, mismatched endian) withNotImplementedError.CuFile.pread.scripts/linum_benchmark_kvikio_zarr.py: benchmark with kvikio andzarr.config.enable_gpu()paths for comparison.read_zarr_to_gpufalls back to zarr-gpu when kvikio is in compat mode, when arrays aren't GDS-compatible, or on any runtime failure.Server / build
shell_scripts/server_setup/nvfs_kernel7_patch.sh: nvidia-fs 2.28.4 patch for kernel 7.0; symvers helper now also handles.ko.zst.pyproject.toml: bumpome-zarrto>=0.16.0(NGFF 0.5).Nextflow pipeline GPU wiring
fix_focal_curvatureandstackprocesses pass--use_gpu/--no-use_gpufromparams.use_gpu.nextflow.config:withName: "fix_focal_curvature"getsmaxForks = params.use_gpu ? 4 : null.withName: "resample_mosaic_grid":maxForks = params.use_gpu ? 6 : null(measured ~1 GB GPU mem per fork; IO-gated)._run_pipelined: prefetch + GPU compute pipeline; periodic free of cupy memory pool.