feat(governance): add Gebru-style datasheets for training datasets by Hopelynconsult · Pull Request #52 · Climate-Vision/ClimateVision

Hopelynconsult · 2026-05-07T16:19:59Z

Summary

Adds governance/datasheet.py — Gebru et al. 2018, "Datasheets for Datasets". Companion to the Mitchell-style model cards from feat(governance): add automated model card generator #37.
Adds scripts/generate_datasheet.py mirroring scripts/generate_model_card.py so the release CI pipeline can run them in sequence.
Public API matches the model_card module (build_datasheet / render_markdown / write_datasheet / generate) so contributors only learn one pattern.

Why we need this

A model card describes the model. A datasheet describes the dataset that trained the model. The two artifacts answer different questions and the responsible-AI literature is clear that releases need both — for ClimateVision specifically, the dataset choices (which biomes, which years, which label sources) drive the geographic-bias risk that #30 and the bias-audit framework are trying to surface, so we need a structured place to document those choices alongside every release.

What's in the PR

Sections covered: motivation, composition, collection process, preprocessing, uses (intended + inappropriate), distribution, maintenance.

Datasheet dataclass with JSON serialisation, mirrors ModelCard
REQUIRED_QUESTIONS schema enforces minimum answers (purpose, creators, instances, labels, splits, source, timeframe, intended_uses, inappropriate_uses) — anything else is free-form so the schema can grow without code changes
Sensible defaults for inappropriate_uses and maintenance so existing dataset manifests don't have to be exhaustive on day one
12 unit tests covering: build, override behaviour, all required-field validations, markdown rendering shape, JSON round-trip, and YAML manifest loading

Follow-ups (out of scope here)

Author the first concrete datasheets for the Sentinel-2 deforestation, ice-melt, and flood training datasets (separate doc-PR per dataset).
Wire scripts/generate_datasheet.py into the release CI alongside generate_model_card.py.
Cross-link datasheet → model card in governance/model_card.py so each card includes a pointer to its training datasheet.

Test plan

pytest tests/test_datasheet.py -q → 12 passed
Reviewer: confirm REQUIRED_QUESTIONS is the right minimum bar — happy to relax/tighten before we author the first real datasheet.

🤖 Generated with Claude Code

#29) - Add governance module with SHAPExplainer class - Implement band-level and spatial attribution using DeepExplainer - Add /api/explain endpoint for SHAP-based explanations - Create 06_explainability.ipynb with visualization examples - Add shap>=0.42.0 to requirements.txt Closes #22 Co-authored-by: Linda Oraegbunam <obielinda@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

@Goldokpa

* Add SECURITY.md for security policy and reporting Added a security policy document outlining supported versions, vulnerability reporting, scope, and disclosure policy. * chore: add PR template for contributor guidance Add a pull request template to guide contributors. * chore: add CODEOWNERS assigning @Goldokpa as default reviewer Added CODEOWNERS file to define code ownership and review assignments. * chore: add Dependabot config for pip, npm, and GitHub Actions Configured Dependabot for Python, GitHub Actions, and npm dependencies with specified schedules and reviewers. * chore: add CHANGELOG.md following Keep a Changelog format Document notable changes, additions, modifications, and removals for ClimateVision. * chore: add CITATION.cff to enable GitHub Cite this repository button Added citation file for ClimateVision software. * fix: replace #email placeholder in CODE_OF_CONDUCT with Security Advisory link This change updates the Code of Conduct document by removing the original content and replacing it with a new version that includes various sections on community standards, enforcement responsibilities, and guidelines. * chore: remove SETUP_COMPLETE.md (internal artifact not suited for public repo) * chore: remove internal team_docs (Francis_Umo_Role.pdf) from public repo * chore: remove internal team_docs (Olufemi_Taiwo_Role.pdf) from public repo

Adds BiomassRegressor — a wrapper around sklearn RandomForest and xgboost.XGBRegressor that exposes a stable fit/predict/evaluate/save API for ClimateVision pipelines. Default feature ordering matches the spectral indices produced by the data preprocessor (NDVI, EVI, SAVI, NDMI, NBR, R, G, B, NIR, SWIR1). Also adds: - biomass_to_carbon / biomass_to_co2e helpers using IPCC defaults (carbon fraction 0.47, 44/12 ratio for CO2e). - evaluate_regression for RMSE, MAE, R^2, and MAPE. - estimate_biomass_from_indices for inference over a dict of per-pixel index arrays. - save() / load() round-trip via pickle. Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>

…ook (#47) End-to-end notebook that loads (or simulates) a biomass-labelled spectral dataset, trains a Random Forest BiomassRegressor, evaluates RMSE/MAE/R^2/MAPE, converts biomass predictions to carbon and CO2e using IPCC defaults, plots feature importances, and persists the model + metrics for the analytics API and the model-card pipeline. Falls back to a synthetic dataset when the labelled parquet file is not present, so the notebook is runnable in CI. Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>

Validates segmentation predictions against reference masks and the biomass regressor against held-out labels across Amazon, Congo, and Southeast Asia. Computes IoU, F1, precision, recall, accuracy, and the regression metrics RMSE/MAE/R^2/MAPE. Aggregates per-region and mean values into a single benchmark_report.json that the governance CI gate and the model-card generator consume directly. Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>

End-to-end impact reporting workflow: loads a deforestation mask, runs estimate_carbon for tonnes-of-CO2e and confidence intervals, computes a trend over the trailing 12 months, attaches validation metrics from 04_model_validation.ipynb when available, calls analytics.reporting.generate_report, and renders a stakeholder-ready Markdown narrative with headline numbers, trend, and validation section. Region/period/bbox are top-level constants so the same notebook can be re-run for Amazon, Congo, or Southeast Asia. Co-authored-by: Francis Umo <franchaise@users.noreply.github.com>

) Adds BatchProcessor for running inference over a list of sources in parallel. Tracks per-job state (queued -> running -> succeeded/failed), records timing and attempt counts, and appends each terminal job to a JSONL manifest as it finishes so long-running batches can be resumed or audited without waiting on the whole queue. Supports configurable retry on transient failures and an injectable inference_fn for testing and for swapping in batch_predict implementations later.

#41) AlertGenerator matches inference results against per-organisation subscriptions (region bbox + analysis_type + threshold) and fires alerts when the measured value crosses the threshold. Per-subscription cooldown windows deduplicate flapping signals. Severity is classified medium/high/critical based on how far the value is past the threshold. Channels are pluggable: register a delivery callable per channel name ('log', 'email', 'webhook', ...) and the dispatcher will route to all the channels each subscription opted into. Persists every fired alert to a JSONL log for replay and audit.

Adds api/admin.py — a self-contained APIRouter exposing two read-only operational endpoints: - GET /api/reports — data-quality KPIs (run count, error rate, mean confidence, positive-fraction mean, alert count) over a configurable window. - GET /api/anomalies — list flagged anomaly/alert records, optionally filtered by severity and time window. Both read from the JSONL logs written by the audit logger and the alert generator. They never expose raw input payloads. Wired into the FastAPI app via include_router() in api/main.py.

Hybrid detector that combines an Isolation Forest fitted on rolling prediction history with a statistical fallback (z-score + IQR fences) for the cold-start case. Persists feature history to JSONL and emits anomaly reports for human review.

Append-only JSONL audit trail. Each entry records the model version, SHA-256 hash of the input payload, summary statistics for the output, and a prev_hash linking back through the chain. verify_chain() walks the file and detects any tampered entry by recomputing the hash and checking the link to its predecessor.

Builds Mitchell-style model cards from training config + evaluation metrics, with optional fairness report attached. Renders to both Markdown (for release notes / model registry) and JSON (for downstream tooling). Ships a CLI entrypoint at scripts/generate_model_card.py for the release CI pipeline.

Composes natural-language stakeholder reports from carbon analytics, SHAP attributions, validation metrics, and fairness flags. A deterministic template renderer is always available so the pipeline never blocks on a missing LLM provider; when an LLM callable is supplied (or CLIMATEVISION_LLM_PROVIDER is configured) it smooths the template into prose using the structured data block as ground truth. Includes a JSON sidecar so downstream tooling can ingest the report without re-parsing Markdown.

…rity (#39) scripts/governance_ci_gate.py reads evaluation metrics, an optional fairness report, and an optional security scan, and decides whether a release passes its governance thresholds. Exit code 1 fails the build. Thresholds default to a sensible baseline (IoU>=0.70, F1>=0.75, fairness score>=0.80, zero high/critical security findings) and can be overridden via --thresholds JSON. Renders a Markdown summary suitable for posting back to the PR.

…anner (#34) * feat(security): add API security middleware OWASP-aligned controls layered onto FastAPI: - Per-API-key rate limiter (sliding window, configurable) - Payload size and Content-Length checks - bbox sanity validation (range, ordering, max area to block DoS-via-huge-GEE-queries) - File upload validation by magic bytes + extension whitelist - String input sanitisation against XSS, SQLi, template injection patterns - Security response headers (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection) * feat(security): add inference pipeline guard for adversarial inputs InputAnomalyDetector flags suspicious tiles before model forward pass: - Out-of-range pixel values, NaN/Inf, suspicious uniformity - Gradient analysis to catch noise injection or constant-image attacks PipelineGuard wraps inference with input + output checks: - Rejects predictions where one class dominates >99% (model failure or attack) - Flags low mean confidence and uniform probability distributions - Returns structured security metadata alongside predictions so the API can surface warnings to the caller. * test(security): add security suite and OWASP scanner script - tests/test_security.py: unit tests for the rate limiter, payload/bbox/file validators, sanitiser, pipeline guard input/output checks, and adversarial detection helpers. - scripts/security_scan.py: external scanner that hits a running API and probes for OWASP-style misconfigurations (missing headers, unauthenticated POSTs, bbox over-area, oversized payloads). Outputs a JSON report.

* docs(team): add Adeolu role specification Codifies Data Pipeline & GIS Lead responsibilities: real GEE tile downloads, analysis-specific band mapping, SCL cloud masking at inference time, and the synthetic-fallback guardrail. * docs(data): document data pipeline modules and band contract Single page covering each file in the data package, the analysis-type band contract, the SCL cloud-masking rules, and the synthetic-fallback metadata convention. Helps new contributors avoid hardcoding band lists. * test(data): add band mapping smoke tests Verifies the analysis-type → band contract holds: - Sentinel-2 13-band canonical order - Per-analysis band counts (4/4/3 for deforestation/ice/flood) - SCL append-without-duplicate invariant - Band index resolution and rejection of unknown bands - Enabled vs disabled analysis types from config.yaml

#30) - Add BiasAuditor class with demographic parity, equalized odds, and predictive parity metrics - Implement run_bias_audit() for evaluating model fairness across regions - Add check_fairness_gate() for CI/CD integration - Create scripts/audit_model.py CLI tool for running audits - Add notebooks/07_bias_audit.ipynb with visualization examples - Support Amazon, Congo, Southeast Asia, and Boreal forest regions Closes #23 Co-authored-by: Linda Oraegbunam <obielinda@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

ECE, MCE, Brier score, and reliability-bin computation for binary segmentation outputs. Threshold-driven NGO alerts depend on calibrated confidence: a model that says 0.9 should be right 90% of the time, and miscalibration translates directly into missed events or false alarms. The CalibrationReport dataclass slots into the existing model card generator and release CI gate. Pure numpy at evaluation time, no torch. - ReliabilityBin / CalibrationReport dataclasses with JSON serialisation - evaluate_calibration() one-shot entrypoint - write_calibration_report() for persistence alongside model cards - 12 tests covering perfect/overconfident calibration, edge cases, input validation, and round-trip JSON

…tion feat(governance): add calibration metrics for segmentation confidence

Companion to the Mitchell-style model card generator (#37): where a model card describes the model, a datasheet describes the dataset that trained it (Gebru et al., 2018, "Datasheets for Datasets"). Both artifacts now ship with every release. The public API mirrors model_card.py (build / render / write / generate) so contributors only learn one pattern and the release CI pipeline calls them in sequence. Sections covered: motivation, composition, collection process, preprocessing, uses (intended + inappropriate), distribution, maintenance. A REQUIRED_QUESTIONS schema enforces the bare minimum a release datasheet must answer. - Datasheet dataclass with JSON serialisation - build_datasheet() / write_datasheet() / generate() / render_markdown() - scripts/generate_datasheet.py CLI wired the same way as the model card CLI for the release CI pipeline - 12 tests covering build, validation, defaults, markdown rendering, JSON round-trip, and YAML manifest loading

Hopelynconsult · 2026-05-17T21:26:08Z

Rebased onto current develop to clear the conflict in governance/__init__.py (PR #51 calibration landed in the meantime). Imports + __all__ now carry both calibration and datasheet exports side-by-side. No changes to datasheet.py, the script, or tests — just the merged module index.

Ready for review.

Goldokpa · 2026-05-17T22:06:45Z

📢 Heads-up: repo history was rewritten today (2026-05-18)

We force-pushed a cleaned history across all branches to remove an internal directory from past commits. Your code and this PR are unaffected — only the commit SHAs underneath have shifted. GitHub will re-render the diff against the new base automatically.

If you have a local clone, please bring it back in sync before pushing anything else:

# Option A (simplest): fresh start
git clone https://github.com/Climate-Vision/ClimateVision.git

# Option B: rebase the existing PR branch in your fork
git fetch origin
git checkout <your-branch>
git rebase origin/develop       # likely no conflicts
git push --force-with-lease

Do not git pull on an existing clone — it will produce a messy non-fast-forward state. Either re-clone, or rebase explicitly as above.

Apologies for the interruption — really appreciate your patience here. If anything looks off after rebasing, leave a comment and I'll help unblock right away. Thanks for contributing 🙏

Goldokpa and others added 18 commits May 3, 2026 12:11

Hopelynconsult requested a review from Goldokpa as a code owner May 7, 2026 16:20

Merge pull request #51 from Climate-Vision/feature/governance-calibra…

62499fb

…tion feat(governance): add calibration metrics for segmentation confidence

Goldokpa mentioned this pull request May 7, 2026

governance/datasheet: _PROJECT_ROOT path index is off-by-one (parents[4] → parents[3]) #54

Open

Hopelynconsult and others added 2 commits May 18, 2026 00:25

fix(governance): correct _PROJECT_ROOT parents index in datasheet.py

4811f60

Hopelynconsult force-pushed the feature/governance-datasheet branch from e75dedc to e82de16 Compare May 17, 2026 21:25

Goldokpa force-pushed the feature/governance-datasheet branch from e82de16 to 4811f60 Compare May 17, 2026 21:42

Goldokpa force-pushed the develop branch from 2d7f271 to 62499fb Compare May 17, 2026 21:42

Goldokpa changed the base branch from develop to main May 17, 2026 22:21

Goldokpa mentioned this pull request May 17, 2026

chore: consolidate develop into main (single-trunk) #63

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(governance): add Gebru-style datasheets for training datasets#52

feat(governance): add Gebru-style datasheets for training datasets#52
Hopelynconsult wants to merge 21 commits into
mainfrom
feature/governance-datasheet

Hopelynconsult commented May 7, 2026

Uh oh!

Hopelynconsult commented May 17, 2026

Uh oh!

Goldokpa commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Hopelynconsult commented May 7, 2026

Summary

Why we need this

What's in the PR

Follow-ups (out of scope here)

Test plan

Uh oh!

Hopelynconsult commented May 17, 2026

Uh oh!

Goldokpa commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants