feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring by FIrgolitsch · Pull Request #112 · linum-uqam/linumpy

FIrgolitsch · 2026-04-30T02:49:10Z

Stacked PR 13/22 — review order: #115 → #97 → #98 → #99 → #100 → #101 → #108 → #106 → #107 → #87 → #116 → #110 → #111 → #40 → #112 → #113 → #117 → #118 → #120 → #121 → #122 → #123 → #124 → #125

Base: sphinx-config. Retargets to main as upstream PRs merge.

PR — GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring

Extends the GPU stack with end-to-end on-device data flow for the OCT reconstruction pipeline and adds a GPUDirect Storage (GDS) reader as a fast path for reading uncompressed zarr arrays straight into device memory.

GPU keep-on-device

New linumpy.gpu.zarr_io with gpu_zarr_context() (uses zarr.config.enable_gpu()) and read_zarr_to_gpu(...) with auto backend selection (kvikio when available, zarr-gpu otherwise).
linumpy.gpu.interpolation: device-preserving resize, affine_transform, map_coordinates.
New linumpy.gpu.interface with a GPU implementation of find_tissue_interface (no-mask path) using cupyx filters.
linumpy.geometry.interface.find_tissue_interface(..., use_gpu=...) and linumpy.mosaic.stacking.find_z_overlap(..., use_gpu=...) now route to GPU when requested.
linum_aip.py and linum_resample_mosaic_grid.py use gpu_zarr_context to keep tiles on-device through the slab loop and writer.
linum_detect_focal_curvature.py: vectorized roll via take_along_axis (xp dispatch) and --use_gpu/--no-use_gpu.
linum_stack_slices_motor.py: --use_gpu/--no-use_gpu plumbed to find_z_overlap.

kvikio (GDS) reader (prototype)

linumpy/gpu/kvikio_zarr.py: GDS reader for raw uncompressed zarr v2 + v3.
- Refuses incompatible arrays (compressed, filtered, non-C order, mismatched endian) with NotImplementedError.
- Uses contiguous scratch buffer for CuFile.pread.
scripts/linum_benchmark_kvikio_zarr.py: benchmark with kvikio and zarr.config.enable_gpu() paths for comparison.
read_zarr_to_gpu falls back to zarr-gpu when kvikio is in compat mode, when arrays aren't GDS-compatible, or on any runtime failure.

Server / build

shell_scripts/server_setup/nvfs_kernel7_patch.sh: nvidia-fs 2.28.4 patch for kernel 7.0; symvers helper now also handles .ko.zst.
pyproject.toml: bump ome-zarr to >=0.16.0 (NGFF 0.5).

Nextflow pipeline GPU wiring

fix_focal_curvature and stack processes pass --use_gpu/--no-use_gpu from params.use_gpu.
nextflow.config: withName: "fix_focal_curvature" gets maxForks = params.use_gpu ? 4 : null.
withName: "resample_mosaic_grid": maxForks = params.use_gpu ? 6 : null (measured ~1 GB GPU mem per fork; IO-gated).
_run_pipelined: prefetch + GPU compute pipeline; periodic free of cupy memory pool.

Squashed pr-m-gpu-kvikio: - linumpy.gpu.zarr_io: high-level read_zarr_to_gpu dispatcher (kvikio/GDS native -> zarr.config.enable_gpu fallback) - linumpy.gpu.kvikio_zarr: kvikio-backed zarr->GPU reader using contiguous CuFile.pread scratch buffer - GPU keep-on-device for resize/affine/map_coordinates; resample writer uses gpu_zarr_context - linum_aip / linum_aip_png GPU pipelines stay on device end-to-end - Device-aware find_tissue_interface, find_z_overlap; vectorized focal curvature roll - Wire --use_gpu for focal_curvature + stack into nextflow - Increase GPU resample maxForks; tune GPU memory management in _run_pipelined - deps: bump ome-zarr to >=0.16.0 (NGFF 0.5) - Server: nvidia-fs 2.28.4 patch script for kernel 7.0 - Benchmark script linum_benchmark_kvikio_zarr.py - Bug fix: detect_focal_curvature broadcasts per-tile correction across tile positions

…n kernel 7.0 - create_nv.symvers.sh: case-insensitive grep for relative CRCs ('r' vs 'R'), needed because kernel 7.0 / open-gpu-kmd emits __crc_nvidia_p2p_* as section-local rodata. Without this, modversion fallback was skipped and bogus relocation offsets were written, producing an unloadable nvidia-fs. - create_nv.symvers.sh: 'zstd -df --rm' (force) so reruns don't stall on the 'overwrite (y/n)?' prompt when nvidia.ko already exists. - nvfs-mmap.c patch: idempotent (skip if already applied).

…r-python PR #2863)

…tdCodec

FIrgolitsch force-pushed the sphinx-config branch from 6cb511c to 288f45f Compare April 30, 2026 03:21

FIrgolitsch force-pushed the pr-m-gpu-kvikio branch from 7544bf5 to 95f24bc Compare April 30, 2026 03:21

FIrgolitsch force-pushed the sphinx-config branch from 288f45f to af57ab6 Compare April 30, 2026 03:26

FIrgolitsch force-pushed the pr-m-gpu-kvikio branch from 95f24bc to 14bfd65 Compare April 30, 2026 03:26

FIrgolitsch force-pushed the sphinx-config branch from af57ab6 to 47e639e Compare April 30, 2026 03:51

FIrgolitsch force-pushed the pr-m-gpu-kvikio branch 2 times, most recently from ab73c15 to a6f274b Compare May 1, 2026 17:20

This was referenced May 1, 2026

fix(galvo): per-tile detect + threaded column strips + --skip_tiles for manual overrides #117

Open

chore: bump python floor to 3.14, modernize CI/build, ruff FURB sweep #118

Open

FIrgolitsch added 5 commits May 20, 2026 12:19

fix(gpu/zarr_io): support kvikio 26.04+ is_compat_mode_preferred API

cec6ad2

feat(gpu): add NvcompZstdCodec for GPU zstd decode (vendored from zar…

14ee012

…r-python PR #2863)

fix(gpu): also flip zarr config codecs.zstd to route through NvcompZs…

f6c23e5

…tdCodec

FIrgolitsch force-pushed the sphinx-config branch from 47e639e to 1862806 Compare May 20, 2026 16:23

FIrgolitsch force-pushed the pr-m-gpu-kvikio branch from a6f274b to f6c23e5 Compare May 20, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112

feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112
FIrgolitsch wants to merge 5 commits into
sphinx-configfrom
pr-m-gpu-kvikio

FIrgolitsch commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FIrgolitsch commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR — GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring

GPU keep-on-device

kvikio (GDS) reader (prototype)

Server / build

Nextflow pipeline GPU wiring

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FIrgolitsch commented Apr 30, 2026 •

edited

Loading