Skip to content

feat(governance): add distributional drift detection (PSI, KS)#53

Open
Hopelynconsult wants to merge 21 commits into
mainfrom
feature/governance-drift-detector
Open

feat(governance): add distributional drift detection (PSI, KS)#53
Hopelynconsult wants to merge 21 commits into
mainfrom
feature/governance-drift-detector

Conversation

@Hopelynconsult
Copy link
Copy Markdown
Collaborator

Summary

  • Adds governance/drift_detector.py — distributional drift checks that compare recent prediction/input windows against a reference baseline.
  • Two non-parametric methods: Population Stability Index (PSI) and two-sample Kolmogorov-Smirnov.
  • Per-feature DriftResult rolled up into a DriftReport. Designed to plug into the prediction-history JSONL written by the anomaly detector (feat(governance): add anomaly detection for inference outputs #35).

Why this is distinct from the anomaly detector

The anomaly detector (#35) flags individual predictions whose features sit outside historical norms — point anomalies. A model can have zero point anomalies and still be silently drifting if the distribution of its predictions has shifted (e.g., post-monsoon Sentinel-2 statistics on a deforestation model). PSI/KS catch that distributional shift over a window.

What's in the PR

  • DriftResult (per-feature) and DriftReport (multi-feature) dataclasses with JSON serialisation, mirroring governance.calibration.CalibrationReport style.
  • population_stability_index() — bins on the reference's quantiles (canonical PSI), industry-standard severity bands (< 0.1 stable, 0.1–0.25 moderate, >= 0.25 severe). Constant-reference fallback to a single bin.
  • kolmogorov_smirnov() — supremum CDF gap with an asymptotic p-value computed from the standard Kolmogorov series. Avoids pulling scipy into evaluation-time deps.
  • detect_drift() — one-shot entrypoint that takes feature-name → values dicts for both windows and runs PSI or KS per feature.
  • write_drift_report() — persistence alongside model cards / calibration reports.
  • 13 unit tests: identical vs shifted distributions, both methods, per-feature severity isolation, constant-reference edge case, validation (non-finite, empty, feature mismatch, unknown method), JSON round-trip.

Plugging into the existing pipeline

The anomaly detector already writes a JSONL of per-prediction features (mean_confidence, std_confidence, positive_fraction, entropy). Wiring PSI on those four features over a rolling window is a 30-line script — left as a follow-up so this PR stays focused.

Follow-ups (out of scope here)

  • A scheduled CI job that reads the last N days of outputs/anomalies/history.jsonl, picks a baseline window, and emits a drift report.
  • A drift_score threshold added to scripts/governance_ci_gate.py so a release fails if any monitored feature is in the severe band.

Test plan

  • pytest tests/test_drift_detector.py -q → 13 passed
  • Reviewer: confirm the severity bands (PSI_STABLE=0.10, PSI_MODERATE=0.25, KS significance 0.05) are the right defaults for our use case before we wire them into the CI gate.

🤖 Generated with Claude Code

Goldokpa and others added 18 commits May 3, 2026 12:11
#29)

- Add governance module with SHAPExplainer class
- Implement band-level and spatial attribution using DeepExplainer
- Add /api/explain endpoint for SHAP-based explanations
- Create 06_explainability.ipynb with visualization examples
- Add shap>=0.42.0 to requirements.txt

Closes #22

Co-authored-by: Linda Oraegbunam <obielinda@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Add SECURITY.md for security policy and reporting

Added a security policy document outlining supported versions, vulnerability reporting, scope, and disclosure policy.

* chore: add PR template for contributor guidance

Add a pull request template to guide contributors.

* chore: add CODEOWNERS assigning @Goldokpa as default reviewer

Added CODEOWNERS file to define code ownership and review assignments.

* chore: add Dependabot config for pip, npm, and GitHub Actions

Configured Dependabot for Python, GitHub Actions, and npm dependencies with specified schedules and reviewers.

* chore: add CHANGELOG.md following Keep a Changelog format

Document notable changes, additions, modifications, and removals for ClimateVision.

* chore: add CITATION.cff to enable GitHub Cite this repository button

Added citation file for ClimateVision software.

* fix: replace #email placeholder in CODE_OF_CONDUCT with Security Advisory link

This change updates the Code of Conduct document by removing the original content and replacing it with a new version that includes various sections on community standards, enforcement responsibilities, and guidelines.

* chore: remove SETUP_COMPLETE.md (internal artifact not suited for public repo)

* chore: remove internal team_docs (Francis_Umo_Role.pdf) from public repo

* chore: remove internal team_docs (Olufemi_Taiwo_Role.pdf) from public repo
Adds BiomassRegressor — a wrapper around sklearn RandomForest and
xgboost.XGBRegressor that exposes a stable fit/predict/evaluate/save
API for ClimateVision pipelines. Default feature ordering matches the
spectral indices produced by the data preprocessor (NDVI, EVI, SAVI,
NDMI, NBR, R, G, B, NIR, SWIR1).

Also adds:
- biomass_to_carbon / biomass_to_co2e helpers using IPCC defaults
  (carbon fraction 0.47, 44/12 ratio for CO2e).
- evaluate_regression for RMSE, MAE, R^2, and MAPE.
- estimate_biomass_from_indices for inference over a dict of
  per-pixel index arrays.
- save() / load() round-trip via pickle.

Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>
…ook (#47)

End-to-end notebook that loads (or simulates) a biomass-labelled
spectral dataset, trains a Random Forest BiomassRegressor, evaluates
RMSE/MAE/R^2/MAPE, converts biomass predictions to carbon and CO2e
using IPCC defaults, plots feature importances, and persists the
model + metrics for the analytics API and the model-card pipeline.

Falls back to a synthetic dataset when the labelled parquet file is
not present, so the notebook is runnable in CI.

Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>
Validates segmentation predictions against reference masks and the
biomass regressor against held-out labels across Amazon, Congo, and
Southeast Asia. Computes IoU, F1, precision, recall, accuracy, and
the regression metrics RMSE/MAE/R^2/MAPE. Aggregates per-region and
mean values into a single benchmark_report.json that the governance
CI gate and the model-card generator consume directly.

Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>
End-to-end impact reporting workflow: loads a deforestation mask,
runs estimate_carbon for tonnes-of-CO2e and confidence intervals,
computes a trend over the trailing 12 months, attaches validation
metrics from 04_model_validation.ipynb when available, calls
analytics.reporting.generate_report, and renders a stakeholder-ready
Markdown narrative with headline numbers, trend, and validation
section. Region/period/bbox are top-level constants so the same
notebook can be re-run for Amazon, Congo, or Southeast Asia.

Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>
)

Adds BatchProcessor for running inference over a list of sources in
parallel. Tracks per-job state (queued -> running -> succeeded/failed),
records timing and attempt counts, and appends each terminal job to a
JSONL manifest as it finishes so long-running batches can be resumed
or audited without waiting on the whole queue.

Supports configurable retry on transient failures and an injectable
inference_fn for testing and for swapping in batch_predict
implementations later.
#41)

AlertGenerator matches inference results against per-organisation
subscriptions (region bbox + analysis_type + threshold) and fires
alerts when the measured value crosses the threshold. Per-subscription
cooldown windows deduplicate flapping signals. Severity is classified
medium/high/critical based on how far the value is past the threshold.

Channels are pluggable: register a delivery callable per channel name
('log', 'email', 'webhook', ...) and the dispatcher will route to all
the channels each subscription opted into. Persists every fired alert
to a JSONL log for replay and audit.
Adds api/admin.py — a self-contained APIRouter exposing two read-only
operational endpoints:

- GET /api/reports — data-quality KPIs (run count, error rate, mean
  confidence, positive-fraction mean, alert count) over a configurable
  window.
- GET /api/anomalies — list flagged anomaly/alert records, optionally
  filtered by severity and time window.

Both read from the JSONL logs written by the audit logger and the
alert generator. They never expose raw input payloads. Wired into the
FastAPI app via include_router() in api/main.py.
Hybrid detector that combines an Isolation Forest fitted on rolling
prediction history with a statistical fallback (z-score + IQR fences)
for the cold-start case. Persists feature history to JSONL and emits
anomaly reports for human review.
Append-only JSONL audit trail. Each entry records the model version,
SHA-256 hash of the input payload, summary statistics for the output,
and a prev_hash linking back through the chain. verify_chain() walks
the file and detects any tampered entry by recomputing the hash and
checking the link to its predecessor.
Builds Mitchell-style model cards from training config + evaluation
metrics, with optional fairness report attached. Renders to both
Markdown (for release notes / model registry) and JSON (for downstream
tooling). Ships a CLI entrypoint at scripts/generate_model_card.py for
the release CI pipeline.
Composes natural-language stakeholder reports from carbon analytics,
SHAP attributions, validation metrics, and fairness flags. A
deterministic template renderer is always available so the pipeline
never blocks on a missing LLM provider; when an LLM callable is
supplied (or CLIMATEVISION_LLM_PROVIDER is configured) it smooths the
template into prose using the structured data block as ground truth.

Includes a JSON sidecar so downstream tooling can ingest the report
without re-parsing Markdown.
…rity (#39)

scripts/governance_ci_gate.py reads evaluation metrics, an optional
fairness report, and an optional security scan, and decides whether a
release passes its governance thresholds. Exit code 1 fails the build.

Thresholds default to a sensible baseline (IoU>=0.70, F1>=0.75,
fairness score>=0.80, zero high/critical security findings) and can be
overridden via --thresholds JSON. Renders a Markdown summary suitable
for posting back to the PR.
…anner (#34)

* feat(security): add API security middleware

OWASP-aligned controls layered onto FastAPI:
- Per-API-key rate limiter (sliding window, configurable)
- Payload size and Content-Length checks
- bbox sanity validation (range, ordering, max area to block DoS-via-huge-GEE-queries)
- File upload validation by magic bytes + extension whitelist
- String input sanitisation against XSS, SQLi, template injection patterns
- Security response headers (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection)

* feat(security): add inference pipeline guard for adversarial inputs

InputAnomalyDetector flags suspicious tiles before model forward pass:
- Out-of-range pixel values, NaN/Inf, suspicious uniformity
- Gradient analysis to catch noise injection or constant-image attacks

PipelineGuard wraps inference with input + output checks:
- Rejects predictions where one class dominates >99% (model failure or attack)
- Flags low mean confidence and uniform probability distributions
- Returns structured security metadata alongside predictions so the API
  can surface warnings to the caller.

* test(security): add security suite and OWASP scanner script

- tests/test_security.py: unit tests for the rate limiter, payload/bbox/file
  validators, sanitiser, pipeline guard input/output checks, and adversarial
  detection helpers.
- scripts/security_scan.py: external scanner that hits a running API and
  probes for OWASP-style misconfigurations (missing headers, unauthenticated
  POSTs, bbox over-area, oversized payloads). Outputs a JSON report.
* docs(team): add Adeolu role specification

Codifies Data Pipeline & GIS Lead responsibilities: real GEE tile downloads,
analysis-specific band mapping, SCL cloud masking at inference time, and the
synthetic-fallback guardrail.

* docs(data): document data pipeline modules and band contract

Single page covering each file in the data package, the analysis-type band
contract, the SCL cloud-masking rules, and the synthetic-fallback metadata
convention. Helps new contributors avoid hardcoding band lists.

* test(data): add band mapping smoke tests

Verifies the analysis-type → band contract holds:
- Sentinel-2 13-band canonical order
- Per-analysis band counts (4/4/3 for deforestation/ice/flood)
- SCL append-without-duplicate invariant
- Band index resolution and rejection of unknown bands
- Enabled vs disabled analysis types from config.yaml
#30)

- Add BiasAuditor class with demographic parity, equalized odds, and predictive parity metrics
- Implement run_bias_audit() for evaluating model fairness across regions
- Add check_fairness_gate() for CI/CD integration
- Create scripts/audit_model.py CLI tool for running audits
- Add notebooks/07_bias_audit.ipynb with visualization examples
- Support Amazon, Congo, Southeast Asia, and Boreal forest regions

Closes #23

Co-authored-by: Linda Oraegbunam <obielinda@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
ECE, MCE, Brier score, and reliability-bin computation for binary
segmentation outputs. Threshold-driven NGO alerts depend on calibrated
confidence: a model that says 0.9 should be right 90% of the time, and
miscalibration translates directly into missed events or false alarms.

The CalibrationReport dataclass slots into the existing model card
generator and release CI gate. Pure numpy at evaluation time, no torch.

- ReliabilityBin / CalibrationReport dataclasses with JSON serialisation
- evaluate_calibration() one-shot entrypoint
- write_calibration_report() for persistence alongside model cards
- 12 tests covering perfect/overconfident calibration, edge cases,
  input validation, and round-trip JSON
@Hopelynconsult Hopelynconsult requested a review from Goldokpa as a code owner May 7, 2026 16:21
…tion

feat(governance): add calibration metrics for segmentation confidence
Hopelynconsult and others added 2 commits May 18, 2026 00:05
Complement to the per-point anomaly detector (#35): the anomaly detector
flags individual predictions whose features fall outside historical
norms; this module compares the *distribution* of recent predictions (or
inputs) against a reference baseline and flags drift even when no single
prediction is anomalous.

Two non-parametric tests:
- Population Stability Index over reference quantile bins. PSI < 0.1
  stable, 0.1-0.25 moderate, > 0.25 severe (industry-standard rule of
  thumb).
- Two-sample Kolmogorov-Smirnov, with the asymptotic p-value computed
  from the standard Kolmogorov series so we don't pull in scipy at
  evaluation time.

Both run per-feature; a DriftReport aggregates per-feature DriftResults
so callers (CI gate, monitoring dashboards) decide their own aggregation
policy. Designed to plug into the prediction-history JSONL emitted by
the anomaly detector so drift can run as a scheduled CI step over the
last N days of production predictions.

- DriftResult / DriftReport dataclasses with JSON serialisation
- detect_drift() one-shot entrypoint covering both methods
- write_drift_report() for persistence alongside model cards
- 13 tests covering identical/shifted distributions, both methods,
  per-feature severity, edge cases (constant reference, non-finite,
  empty windows), feature mismatch validation, and JSON round-trip
@Hopelynconsult Hopelynconsult force-pushed the feature/governance-drift-detector branch from 4fdb979 to 61a8348 Compare May 17, 2026 21:05
@Hopelynconsult
Copy link
Copy Markdown
Collaborator Author

Rebased onto current develop to clear the conflict in governance/__init__.py (the calibration PR #51 landed in the meantime). Imports + __all__ now carry both calibration and drift_detector exports side-by-side. No code changes to drift_detector.py or the tests themselves — just the merged module index.

Ready for review.

@Goldokpa Goldokpa force-pushed the feature/governance-drift-detector branch from 61a8348 to 3820d02 Compare May 17, 2026 21:42
@Goldokpa
Copy link
Copy Markdown
Member

📢 Heads-up: repo history was rewritten today (2026-05-18)

We force-pushed a cleaned history across all branches to remove an internal directory from past commits. Your code and this PR are unaffected — only the commit SHAs underneath have shifted. GitHub will re-render the diff against the new base automatically.

If you have a local clone, please bring it back in sync before pushing anything else:

# Option A (simplest): fresh start
git clone https://github.com/Climate-Vision/ClimateVision.git

# Option B: rebase the existing PR branch in your fork
git fetch origin
git checkout <your-branch>
git rebase origin/develop       # likely no conflicts
git push --force-with-lease

Do not git pull on an existing clone — it will produce a messy non-fast-forward state. Either re-clone, or rebase explicitly as above.

Apologies for the interruption — really appreciate your patience here. If anything looks off after rebasing, leave a comment and I'll help unblock right away. Thanks for contributing 🙏

@Goldokpa Goldokpa changed the base branch from develop to main May 17, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants