Skip to content

feat(api): extend /api/health with model load diagnostics#28

Open
Swatikantamishra8 wants to merge 23 commits into
Climate-Vision:mainfrom
Swatikantamishra8:feat/health-model-diagnostics
Open

feat(api): extend /api/health with model load diagnostics#28
Swatikantamishra8 wants to merge 23 commits into
Climate-Vision:mainfrom
Swatikantamishra8:feat/health-model-diagnostics

Conversation

@Swatikantamishra8
Copy link
Copy Markdown

Summary

Closes #20

Extends the /api/health endpoint to report per-model load status, addressing the feature request in issue #20.

Changes

  • Added model_diagnostics dict to /api/health response
  • For each enabled analysis type, attempts to load the model via _load_model()
  • Reports loaded (bool), path (checkpoint path), and error (if any) per model
  • Health status is marked as degraded if any model fails to load

Example Response

{
  "status": "ok",
  "version": "0.2.0",
  "analysis_types": ["deforestation", "ice_melt", "flooding"],
  "config_valid": true,
  "config_issues": [],
  "model_diagnostics": {
    "deforestation": {"loaded": true, "path": "models/best_model.pth", "error": null},
    "ice_melt": {"loaded": false, "path": null, "error": "No checkpoint found"}
  }
}

Notes

  • model_diagnostics values are dict[str, Any] with keys: loaded, path, error
  • Models that load successfully will have error: null
  • This is non-breaking: existing fields are unchanged

Goldokpa and others added 23 commits March 28, 2026 21:22
…iddleware-audit

Merging Olufemi's API middleware and auth modules
…tics-statistics

Merging Francis's analytics statistics and reporting modules
Defines responsibilities, deliverables, and collaboration guidelines for the Carbon Analytics & Validation role.

Co-Authored-By: Francis Umo <francis.umo@climatevision.org>
Defines responsibilities, deliverables, and collaboration guidelines for the API Development & Integration role.

Co-Authored-By: Olufemi Taiwo <olufemi.taiwo@climatevision.org>
…mate-Vision#7)

* feat(data): add GEE tile downloader with analysis-aware band selection

- Downloads real Sentinel-2 composites via Google Earth Engine
- Reads required bands from config.yaml per analysis_type
- Includes SCL band for downstream cloud masking
- Synthetic fallback with explicit is_synthetic flag when GEE unavailable
- Fix .gitignore so src/climatevision/data/ is no longer ignored

* feat(data): add analysis-specific Sentinel-2 band mapping utilities

- get_bands_for_analysis() reads correct bands from config.yaml
- get_band_indices() maps band names to canonical 13-band stack positions
- is_analysis_enabled() and list_enabled_analysis_types() for config validation
- Includes SCL band helpers for downstream cloud masking

* feat(data): integrate SCL cloud masking and export new pipeline modules

- apply_scl_cloud_mask() masks cloudy pixels using Sentinel-2 SCL band
- Default clear labels: vegetation, bare soils, water, snow
- Update __init__.py to expose gee_downloader and band_mapping utilities

* refactor(data): address PR review feedback

- Remove duplicated config logic in gee_downloader.py; import from band_mapping
- Cache config.yaml load in band_mapping.py via lru_cache
- Read synthetic tile size from config.yaml instead of hardcoding 256
- Remove unused json import in gee_downloader.py
- Add shape validation in apply_scl_cloud_mask

---------

Co-authored-by: Adeolu Mary Oshadare <adeolu@placeholder.com>
…ing (Climate-Vision#8)

* feat(inference): make pipeline analysis-aware with dynamic model loading

- _load_model() now accepts analysis_type and reads in_channels/num_classes from config.yaml
- Per-analysis-type model cache prevents cross-contamination between deforestation/ice/flood models
- _find_best_checkpoint() prefers config.yaml weight path per analysis type
- run_inference() accepts analysis_type, pads/crops to correct n_channels, and returns dynamic class counts
- run_inference_from_file() and run_inference_from_gee() propagate analysis_type parameter

* feat(api): wire analysis_type into prediction endpoints

- Pass body.analysis_type to run_inference_from_gee() in /api/predict
- Pass analysis_type to run_inference_from_file() in /api/predict/upload
- Enables the API to load the correct model and return correct class counts per analysis type

---------

Co-authored-by: Olufemi Taiwo <Olufemitaiwo23@gmail.com>
… flag, add config health validation

- Add cv_dev development key bypass for local testing
- Require X-API-Key on all mutation endpoints (POST predict, orgs, alerts, subscriptions)
- Surface is_synthetic at root of inference response for frontend demo banners
- Expand /api/health to validate config alignment (bands vs in_channels, classes vs num_classes)
- Add FastAPI test client fixture
- Create CI workflow for Python (flake8, pytest) and frontend (npm build)
- Bootstrap tests/ directory structure
- Parametrize UNet init for all 3 analysis types (4ch/2cl, 4ch/3cl, 3ch/3cl)
- Validate forward pass output shapes
- Add Siamese change detection forward shape test
- Link to 6 active good-first-issue and help-wanted issues
- Add claim workflow for new contributors
- Include time estimates and skill-building map
- ../components/map/ -> ../components/Map/
- Fixes vite build failure on Linux (case-sensitive filesystem)
- Fixes pip install failure for gdal and rasterio on Ubuntu runners
- Adds libgdal-dev, gdal-bin, libgl1-mesa-glx
- gdal Python package requires exact system GDAL version matching
- rasterio covers all GDAL functionality we actually use
- Simplify CI system deps to libgl1 only (for opencv runtime)
- Fixes ModuleNotFoundError: No module named 'climatevision'
- pip install -e . registers src/ as an importable package
- ForestDataset with DataLoader support
- Training/validation augmentation pipelines
- Synthetic tile generation for demo/fallback mode
- Add DONE/PENDING task list for April 2026 sprint
- Include actual .github/workflows/ci.yml code in role doc
- Update local CI check commands to match current workflow
Closes Climate-Vision#20

- Try to load each enabled model via _load_model()
- Report loaded status, checkpoint path, and any errors
- Return model_diagnostics dict in health response
- Mark health as degraded if any model fails to load
Copy link
Copy Markdown
Collaborator

@femi23 femi23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @Swatikantamishra8 — the intent (surfacing per-model load status from /api/health) is exactly right and matches what we need for the Kubernetes readiness probe story. Unfortunately the patch as it stands won't parse, so a re-push is needed before we can land it.

Blockers

1. The diff has trailing whitespace on the module docstring from __future__ import annotations line. Minor, but combined with #2 it suggests the patch went through an editor that mangled whitespace.

2. The added block is not valid Python — indentation is broken. In health() the diff inserts:

          model_diagnostics: dict[str, Any] = {}
          from climatevision.inference.pipeline import _load_model, _find_best_checkpoint
          for atype in enabled_types:
                    name = atype["name"]
                    mstatus: dict[str, Any] = {"loaded": False, "path": None, "error": None}
                    try:
                                  _load_model(name)
                                  mp = _find_best_checkpoint(name)
                                  mstatus["loaded"] = True
                                  mstatus["path"] = str(mp) if mp else None
                              except Exception as exc:
                                            mstatus["error"] = str(exc)
                                        model_diagnostics[name] = mstatus

Three problems:

  • The leading indent is 10 spaces; the surrounding function body uses 8.
  • try: body is indented 30 spaces, but except is at 30 too and offset by more spaces than try — Python will raise IndentationError.
  • The final model_diagnostics[name] = mstatus sits inside the except block instead of after it, so the success path never populates the dict.

Please run python -m py_compile src/climatevision/api/main.py locally before pushing — that'll catch this in one shot. CI should fail this too, which makes me think the diff wasn't actually pushed/tested locally before being sent.

3. _load_model and _find_best_checkpoint are module-private (leading underscore). Reaching into private helpers from another module breaks our pipeline.py contract. Two cleaner options:

  • Expose a public get_model_load_status(name) -> dict from inference/pipeline.py (preferred — keeps the diagnostics shape behind one API).
  • Or move the diagnostics logic into pipeline.py and call a single public function from health().

Should-fix

4. Calling _load_model for every analysis type on every /health hit is expensive if the model isn't already cached — it does disk I/O and potentially a torch state-dict load. The health endpoint is hit by load balancers every few seconds in prod, so this could DoS our own checkpoints folder. Please:

  • read from the existing in-memory cache only (don't force-load)
  • or guard the deep diagnostics behind a ?deep=true query param and have the default response stay cheap (just cached: true/false).

5. Please add a test. Something like tests/test_health.py::test_health_includes_model_diagnostics asserting the response contains model_diagnostics: {<name>: {loaded, path, error}} for each enabled analysis type, with _load_model patched to (a) succeed and (b) raise.

Once the syntax is fixed and we have a public accessor + caching guard, I think the rest will be quick. Looking forward to v2.

@Goldokpa
Copy link
Copy Markdown
Member

📢 Heads-up: repo history was rewritten today (2026-05-18)

We force-pushed a cleaned history across all branches to remove an internal directory from past commits. Your code and this PR are unaffected — only the commit SHAs underneath have shifted. GitHub will re-render the diff against the new base automatically.

If you have a local clone, please bring it back in sync before pushing anything else:

# Option A (simplest): fresh start
git clone https://github.com/Climate-Vision/ClimateVision.git

# Option B: rebase the existing PR branch in your fork
git fetch origin
git checkout <your-branch>
git rebase origin/main          # likely no conflicts
git push --force-with-lease

Do not git pull on an existing clone — it will produce a messy non-fast-forward state. Either re-clone, or rebase explicitly as above.

Apologies for the interruption — really appreciate your patience here. If anything looks off after rebasing, leave a comment and I'll help unblock right away. Thanks for contributing 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Good First Issue] Extend /api/health with model load diagnostics

5 participants