feat(api): extend /api/health with model load diagnostics#28
feat(api): extend /api/health with model load diagnostics#28Swatikantamishra8 wants to merge 23 commits into
Conversation
…iddleware-audit Merging Olufemi's API middleware and auth modules
…tics-statistics Merging Francis's analytics statistics and reporting modules
Defines responsibilities, deliverables, and collaboration guidelines for the Carbon Analytics & Validation role. Co-Authored-By: Francis Umo <francis.umo@climatevision.org>
Defines responsibilities, deliverables, and collaboration guidelines for the API Development & Integration role. Co-Authored-By: Olufemi Taiwo <olufemi.taiwo@climatevision.org>
…role-document Merged by Mary Oshadare
…e-document Merged by Mary Oshadare
…mate-Vision#7) * feat(data): add GEE tile downloader with analysis-aware band selection - Downloads real Sentinel-2 composites via Google Earth Engine - Reads required bands from config.yaml per analysis_type - Includes SCL band for downstream cloud masking - Synthetic fallback with explicit is_synthetic flag when GEE unavailable - Fix .gitignore so src/climatevision/data/ is no longer ignored * feat(data): add analysis-specific Sentinel-2 band mapping utilities - get_bands_for_analysis() reads correct bands from config.yaml - get_band_indices() maps band names to canonical 13-band stack positions - is_analysis_enabled() and list_enabled_analysis_types() for config validation - Includes SCL band helpers for downstream cloud masking * feat(data): integrate SCL cloud masking and export new pipeline modules - apply_scl_cloud_mask() masks cloudy pixels using Sentinel-2 SCL band - Default clear labels: vegetation, bare soils, water, snow - Update __init__.py to expose gee_downloader and band_mapping utilities * refactor(data): address PR review feedback - Remove duplicated config logic in gee_downloader.py; import from band_mapping - Cache config.yaml load in band_mapping.py via lru_cache - Read synthetic tile size from config.yaml instead of hardcoding 256 - Remove unused json import in gee_downloader.py - Add shape validation in apply_scl_cloud_mask --------- Co-authored-by: Adeolu Mary Oshadare <adeolu@placeholder.com>
…ing (Climate-Vision#8) * feat(inference): make pipeline analysis-aware with dynamic model loading - _load_model() now accepts analysis_type and reads in_channels/num_classes from config.yaml - Per-analysis-type model cache prevents cross-contamination between deforestation/ice/flood models - _find_best_checkpoint() prefers config.yaml weight path per analysis type - run_inference() accepts analysis_type, pads/crops to correct n_channels, and returns dynamic class counts - run_inference_from_file() and run_inference_from_gee() propagate analysis_type parameter * feat(api): wire analysis_type into prediction endpoints - Pass body.analysis_type to run_inference_from_gee() in /api/predict - Pass analysis_type to run_inference_from_file() in /api/predict/upload - Enables the API to load the correct model and return correct class counts per analysis type --------- Co-authored-by: Olufemi Taiwo <Olufemitaiwo23@gmail.com>
… flag, add config health validation - Add cv_dev development key bypass for local testing - Require X-API-Key on all mutation endpoints (POST predict, orgs, alerts, subscriptions) - Surface is_synthetic at root of inference response for frontend demo banners - Expand /api/health to validate config alignment (bands vs in_channels, classes vs num_classes)
- Add FastAPI test client fixture - Create CI workflow for Python (flake8, pytest) and frontend (npm build) - Bootstrap tests/ directory structure
- Parametrize UNet init for all 3 analysis types (4ch/2cl, 4ch/3cl, 3ch/3cl) - Validate forward pass output shapes - Add Siamese change detection forward shape test
- Link to 6 active good-first-issue and help-wanted issues - Add claim workflow for new contributors - Include time estimates and skill-building map
- ../components/map/ -> ../components/Map/ - Fixes vite build failure on Linux (case-sensitive filesystem)
- Fixes pip install failure for gdal and rasterio on Ubuntu runners - Adds libgdal-dev, gdal-bin, libgl1-mesa-glx
- gdal Python package requires exact system GDAL version matching - rasterio covers all GDAL functionality we actually use - Simplify CI system deps to libgl1 only (for opencv runtime)
- Fixes ModuleNotFoundError: No module named 'climatevision' - pip install -e . registers src/ as an importable package
- ForestDataset with DataLoader support - Training/validation augmentation pipelines - Synthetic tile generation for demo/fallback mode
- Add DONE/PENDING task list for April 2026 sprint - Include actual .github/workflows/ci.yml code in role doc - Update local CI check commands to match current workflow
Closes Climate-Vision#20 - Try to load each enabled model via _load_model() - Report loaded status, checkpoint path, and any errors - Return model_diagnostics dict in health response - Mark health as degraded if any model fails to load
femi23
left a comment
There was a problem hiding this comment.
Thanks for the contribution @Swatikantamishra8 — the intent (surfacing per-model load status from /api/health) is exactly right and matches what we need for the Kubernetes readiness probe story. Unfortunately the patch as it stands won't parse, so a re-push is needed before we can land it.
Blockers
1. The diff has trailing whitespace on the module docstring from __future__ import annotations line. Minor, but combined with #2 it suggests the patch went through an editor that mangled whitespace.
2. The added block is not valid Python — indentation is broken. In health() the diff inserts:
model_diagnostics: dict[str, Any] = {}
from climatevision.inference.pipeline import _load_model, _find_best_checkpoint
for atype in enabled_types:
name = atype["name"]
mstatus: dict[str, Any] = {"loaded": False, "path": None, "error": None}
try:
_load_model(name)
mp = _find_best_checkpoint(name)
mstatus["loaded"] = True
mstatus["path"] = str(mp) if mp else None
except Exception as exc:
mstatus["error"] = str(exc)
model_diagnostics[name] = mstatusThree problems:
- The leading indent is 10 spaces; the surrounding function body uses 8.
try:body is indented 30 spaces, butexceptis at 30 too and offset by more spaces thantry— Python will raiseIndentationError.- The final
model_diagnostics[name] = mstatussits inside theexceptblock instead of after it, so the success path never populates the dict.
Please run python -m py_compile src/climatevision/api/main.py locally before pushing — that'll catch this in one shot. CI should fail this too, which makes me think the diff wasn't actually pushed/tested locally before being sent.
3. _load_model and _find_best_checkpoint are module-private (leading underscore). Reaching into private helpers from another module breaks our pipeline.py contract. Two cleaner options:
- Expose a public
get_model_load_status(name) -> dictfrominference/pipeline.py(preferred — keeps the diagnostics shape behind one API). - Or move the diagnostics logic into
pipeline.pyand call a single public function fromhealth().
Should-fix
4. Calling _load_model for every analysis type on every /health hit is expensive if the model isn't already cached — it does disk I/O and potentially a torch state-dict load. The health endpoint is hit by load balancers every few seconds in prod, so this could DoS our own checkpoints folder. Please:
- read from the existing in-memory cache only (don't force-load)
- or guard the deep diagnostics behind a
?deep=truequery param and have the default response stay cheap (justcached: true/false).
5. Please add a test. Something like tests/test_health.py::test_health_includes_model_diagnostics asserting the response contains model_diagnostics: {<name>: {loaded, path, error}} for each enabled analysis type, with _load_model patched to (a) succeed and (b) raise.
Once the syntax is fixed and we have a public accessor + caching guard, I think the rest will be quick. Looking forward to v2.
|
📢 Heads-up: repo history was rewritten today (2026-05-18) We force-pushed a cleaned history across all branches to remove an internal directory from past commits. Your code and this PR are unaffected — only the commit SHAs underneath have shifted. GitHub will re-render the diff against the new base automatically. If you have a local clone, please bring it back in sync before pushing anything else: # Option A (simplest): fresh start
git clone https://github.com/Climate-Vision/ClimateVision.git
# Option B: rebase the existing PR branch in your fork
git fetch origin
git checkout <your-branch>
git rebase origin/main # likely no conflicts
git push --force-with-leaseDo not Apologies for the interruption — really appreciate your patience here. If anything looks off after rebasing, leave a comment and I'll help unblock right away. Thanks for contributing 🙏 |
Summary
Closes #20
Extends the
/api/healthendpoint to report per-model load status, addressing the feature request in issue #20.Changes
model_diagnosticsdict to/api/healthresponse_load_model()loaded(bool),path(checkpoint path), anderror(if any) per modeldegradedif any model fails to loadExample Response
{ "status": "ok", "version": "0.2.0", "analysis_types": ["deforestation", "ice_melt", "flooding"], "config_valid": true, "config_issues": [], "model_diagnostics": { "deforestation": {"loaded": true, "path": "models/best_model.pth", "error": null}, "ice_melt": {"loaded": false, "path": null, "error": "No checkpoint found"} } }Notes
model_diagnosticsvalues aredict[str, Any]with keys:loaded,path,errorerror: null