feat: add ONNX Runtime inference backend with PyTorch fallback (#12)#32
feat: add ONNX Runtime inference backend with PyTorch fallback (#12)#32jshaofa-ui wants to merge 22 commits into
Conversation
…iddleware-audit Merging Olufemi's API middleware and auth modules
…tics-statistics Merging Francis's analytics statistics and reporting modules
Defines responsibilities, deliverables, and collaboration guidelines for the Carbon Analytics & Validation role. Co-Authored-By: Francis Umo <francis.umo@climatevision.org>
Defines responsibilities, deliverables, and collaboration guidelines for the API Development & Integration role. Co-Authored-By: Olufemi Taiwo <olufemi.taiwo@climatevision.org>
…role-document Merged by Mary Oshadare
…e-document Merged by Mary Oshadare
…mate-Vision#7) * feat(data): add GEE tile downloader with analysis-aware band selection - Downloads real Sentinel-2 composites via Google Earth Engine - Reads required bands from config.yaml per analysis_type - Includes SCL band for downstream cloud masking - Synthetic fallback with explicit is_synthetic flag when GEE unavailable - Fix .gitignore so src/climatevision/data/ is no longer ignored * feat(data): add analysis-specific Sentinel-2 band mapping utilities - get_bands_for_analysis() reads correct bands from config.yaml - get_band_indices() maps band names to canonical 13-band stack positions - is_analysis_enabled() and list_enabled_analysis_types() for config validation - Includes SCL band helpers for downstream cloud masking * feat(data): integrate SCL cloud masking and export new pipeline modules - apply_scl_cloud_mask() masks cloudy pixels using Sentinel-2 SCL band - Default clear labels: vegetation, bare soils, water, snow - Update __init__.py to expose gee_downloader and band_mapping utilities * refactor(data): address PR review feedback - Remove duplicated config logic in gee_downloader.py; import from band_mapping - Cache config.yaml load in band_mapping.py via lru_cache - Read synthetic tile size from config.yaml instead of hardcoding 256 - Remove unused json import in gee_downloader.py - Add shape validation in apply_scl_cloud_mask --------- Co-authored-by: Adeolu Mary Oshadare <adeolu@placeholder.com>
…ing (Climate-Vision#8) * feat(inference): make pipeline analysis-aware with dynamic model loading - _load_model() now accepts analysis_type and reads in_channels/num_classes from config.yaml - Per-analysis-type model cache prevents cross-contamination between deforestation/ice/flood models - _find_best_checkpoint() prefers config.yaml weight path per analysis type - run_inference() accepts analysis_type, pads/crops to correct n_channels, and returns dynamic class counts - run_inference_from_file() and run_inference_from_gee() propagate analysis_type parameter * feat(api): wire analysis_type into prediction endpoints - Pass body.analysis_type to run_inference_from_gee() in /api/predict - Pass analysis_type to run_inference_from_file() in /api/predict/upload - Enables the API to load the correct model and return correct class counts per analysis type --------- Co-authored-by: Olufemi Taiwo <Olufemitaiwo23@gmail.com>
… flag, add config health validation - Add cv_dev development key bypass for local testing - Require X-API-Key on all mutation endpoints (POST predict, orgs, alerts, subscriptions) - Surface is_synthetic at root of inference response for frontend demo banners - Expand /api/health to validate config alignment (bands vs in_channels, classes vs num_classes)
- Add FastAPI test client fixture - Create CI workflow for Python (flake8, pytest) and frontend (npm build) - Bootstrap tests/ directory structure
- Parametrize UNet init for all 3 analysis types (4ch/2cl, 4ch/3cl, 3ch/3cl) - Validate forward pass output shapes - Add Siamese change detection forward shape test
- Link to 6 active good-first-issue and help-wanted issues - Add claim workflow for new contributors - Include time estimates and skill-building map
- ../components/map/ -> ../components/Map/ - Fixes vite build failure on Linux (case-sensitive filesystem)
- Fixes pip install failure for gdal and rasterio on Ubuntu runners - Adds libgdal-dev, gdal-bin, libgl1-mesa-glx
- gdal Python package requires exact system GDAL version matching - rasterio covers all GDAL functionality we actually use - Simplify CI system deps to libgl1 only (for opencv runtime)
- Fixes ModuleNotFoundError: No module named 'climatevision' - pip install -e . registers src/ as an importable package
- ForestDataset with DataLoader support - Training/validation augmentation pipelines - Synthetic tile generation for demo/fallback mode
- Add DONE/PENDING task list for April 2026 sprint - Include actual .github/workflows/ci.yml code in role doc - Update local CI check commands to match current workflow
- ONNXSession: Cached session manager with auto CPU/CUDA provider selection - run_onnx_inference: Batch inference with latency tracking - benchmark_onnx_model: Full benchmarking (p50/p95/p99 + FPS) - export_unet_to_onnx / export_siamese_to_onnx: Dynamic axis, configurable opset - run_inference_with_fallback: ONNX to PyTorch automatic fallback - validate_onnx_model: Cross-validation with PyTorch output - 32 unit tests across 11 test classes - Graceful skip when torch/onnx not available Closes Climate-Vision#12
Goldokpa
left a comment
There was a problem hiding this comment.
Thanks for the substantial PR! Solid scope and good test coverage. Before this can move forward there are a few real bugs and a few things worth questioning:
Bugs that will break at runtime:
-
run_onnx_inferencewarm-up call is malformed. Inonnx_runtime.py:session.run(session.input_name, {session.input_name: image[:1]})
ONNXSession.runonly takesinput_data(one positional arg), and the underlying_session.runexpects(output_names, input_feed)whereoutput_namesisNoneor a list — not a string. This will throw on every call. Should just besession.run(image[:1]). -
all_outputs.extend(outputs)thenall_outputs[0].session.run()returns a list of arrays (one per output). For a single-output model with one batch,extendflattens to[array], but for multi-batch this stitches outputs from different batches as siblings rather than concatenating along the batch axis. The downstreamargmax/softmaxthen runs only on the first batch's logits and silently drops the rest. Usenp.concatenate([out[0] for out in batched_outputs], axis=0)or similar. -
Fallback path calls
pytorch_inference(image, analysis_type=...)butimageis a numpy array. Worth confirmingrun_inferenceinpipeline.pyaccepts that — if it expects a tensor or a file path, the fallback will always fail. -
_EXECUTION_PROVIDERSis malformed. The CUDA entry is("CUDAExecutionProvider", {"device_id": "0"})— butdevice_idshould be an int, not a string, per the ORT API. Alsoort.InferenceSession(providers=...)expects either a list of strings or a list of(name, options_dict)tuples; the code mixes both formats and theselected_providersfilter only checks the name, so it passes the malformed tuple through unchanged.
Things that look suspicious / worth challenging:
-
Test file references an absolute home path —
pip install -e /home/fa/projects/climatevision-workindocs/onnx-runtime-guide.md, plus the docstring ononnx_runtime.pyand the_DEFAULT_ONNX_DIR = parents[3](same parents-index issue I'd want verified — file is atsrc/climatevision/inference/onnx_runtime.py, soparents[3]should be the repo root, that one's actually correct here, but worth confirming foronnx_export.pytoo). The hardcoded developer path in the docs should be removed. -
873-line test file with 40+ test cases for a new module is unusual. Many tests follow the pattern of
try: import onnxruntime; except: pytest.skip()— meaning in CI withoutonnxruntimeinstalled (which is a new dependency this PR adds), almost every test silently skips. The test fortest_session_raises_without_onnxruntimeliterally haspass # Skip this test as it requires complex mockinginside it, so it asserts nothing. Worth pruning to focused tests that actually run, or pinningonnxruntimeas a test dep. -
Dependencies aren't added to
pyproject.toml/requirements.txt. The PR introducesonnx>=1.14.0andonnxruntime>=1.15.0but only mentions them in the markdown doc. Imports will fail in any environment that hasn't been manually prepared. -
run_onnx_inferencehas no path that returns adict[str, Any]despite the return type annotationONNXInferenceResult | dict[str, Any]. Either the dict branch is missing or the annotation is wrong. -
export_model_from_checkpointcallstorch.load(...)withoutweights_only=True. Recent PyTorch versions warn on this, and it's a security concern for untrusted checkpoints.
Happy to re-review once these are addressed. The overall architecture (session caching, fallback path, benchmark dataclass) is reasonable — the issues are mostly in the wiring.
|
👋 Friendly ping, @jshaofa-ui — checking in on the ONNX backend PR. The main blocker remains the warm-up call in |
|
📢 Heads-up: repo history was rewritten today (2026-05-18) We force-pushed a cleaned history across all branches to remove an internal directory from past commits. Your code and this PR are unaffected — only the commit SHAs underneath have shifted. GitHub will re-render the diff against the new base automatically. If you have a local clone, please bring it back in sync before pushing anything else: # Option A (simplest): fresh start
git clone https://github.com/Climate-Vision/ClimateVision.git
# Option B: rebase the existing PR branch in your fork
git fetch origin
git checkout <your-branch>
git rebase origin/main # likely no conflicts
git push --force-with-leaseDo not Apologies for the interruption — really appreciate your patience here. If anything looks off after rebasing, leave a comment and I'll help unblock right away. Thanks for contributing 🙏 |
[Good First Issue] Add ONNX Runtime inference path with PyTorch fallback
Resolves #12
Summary
Implements a complete ONNX Runtime inference backend for ClimateVision with automatic PyTorch fallback, enabling faster inference on CPU and edge devices while maintaining full compatibility with existing PyTorch models.
Changes
Core Features
Test Coverage
Technical Details