src/state_triage/ Python package (pipeline, models, calibration, evaluation, …)
conf/ Experiment and feature configuration (YAML)
scripts/ Shell helpers for pipeline stages and Docker experiments
docker/ docker-compose.yml + Prometheus config for TiDB v7/v8 validation
tests/ Pytest suite
artifacts/ Pipeline outputs (datasets, models, explainability cases)
Requires Python ≥ 3.12.
Run the full synthetic benchmark pipeline end-to-end:
make all # equivalent to: collect → label → features → train → eval → reportOr run individual stages:
make collect # generate synthetic distributed-DB telemetry and faults
make label # rerun-based flaky labeling
make features # strict-causal feature extraction
make train # model training (LR, LightGBM, RF, …)
make eval # evaluation (ranking, calibration, decision cost, rerun correction)
make report # paper-ready figures and tablesmake smoke # quick sanity check
make replay # online triage replay simulation
make test # pytest suiteRun the real-service experiment against TiDB v7 and v8 with Prometheus telemetry and controlled fault injection:
make docker-expThis will:
- Start TiDB v8, TiDB v7, and Prometheus via
docker compose - Execute workloads with periodic TiKV-restart fault injection
- Collect run / label / telemetry artifacts
- Generate report assets under
paper/assets/(gitignored) - Tear down containers and volumes
Collect and merge public CI metadata traces from distributed-database repositories. See REPRO.md for detailed collection commands.
| Path | Contents |
|---|---|
artifacts/*.parquet |
Curated datasets (telemetry, labels, features) |
artifacts/models/ |
Trained model artifacts |
artifacts/explainability/ |
SHAP summaries and case analyses |
artifacts/public_proxy/ |
Schema for the public proxy dataset |
paper/assets/ |
Generated figures and tables (gitignored) |
uv run state-triage --helpSee REPRO.md for the full reproducibility checklist, including Docker prerequisites and the GitHub Actions collection pipeline.