Skip to content

DASE-DASLab/scout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

State-Aware Flaky Failure Triage in Distributed Database CI

Repository Layout

src/state_triage/       Python package (pipeline, models, calibration, evaluation, …)
conf/                   Experiment and feature configuration (YAML)
scripts/                Shell helpers for pipeline stages and Docker experiments
docker/                 docker-compose.yml + Prometheus config for TiDB v7/v8 validation
tests/                  Pytest suite
artifacts/              Pipeline outputs (datasets, models, explainability cases)

Environment

Requires Python ≥ 3.12.

Pipeline Commands

Run the full synthetic benchmark pipeline end-to-end:

make all          # equivalent to: collect → label → features → train → eval → report

Or run individual stages:

make collect      # generate synthetic distributed-DB telemetry and faults
make label        # rerun-based flaky labeling
make features     # strict-causal feature extraction
make train        # model training (LR, LightGBM, RF, …)
make eval         # evaluation (ranking, calibration, decision cost, rerun correction)
make report       # paper-ready figures and tables

Validation

make smoke        # quick sanity check
make replay       # online triage replay simulation
make test         # pytest suite

Docker TiDB Validation (Real Services)

Run the real-service experiment against TiDB v7 and v8 with Prometheus telemetry and controlled fault injection:

make docker-exp

This will:

  1. Start TiDB v8, TiDB v7, and Prometheus via docker compose
  2. Execute workloads with periodic TiKV-restart fault injection
  3. Collect run / label / telemetry artifacts
  4. Generate report assets under paper/assets/ (gitignored)
  5. Tear down containers and volumes

GitHub Actions External Dataset (Metadata-Only)

Collect and merge public CI metadata traces from distributed-database repositories. See REPRO.md for detailed collection commands.

Key Outputs

Path Contents
artifacts/*.parquet Curated datasets (telemetry, labels, features)
artifacts/models/ Trained model artifacts
artifacts/explainability/ SHAP summaries and case analyses
artifacts/public_proxy/ Schema for the public proxy dataset
paper/assets/ Generated figures and tables (gitignored)

CLI

uv run state-triage --help

Reproducibility

See REPRO.md for the full reproducibility checklist, including Docker prerequisites and the GitHub Actions collection pipeline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages