Deterministic CI root-cause analysis for failed CI runs.
From the curated MVP benchmark (13 cases):
- Classification accuracy:
100%(13/13) - Baseline classification accuracy:
69.23%(9/13) - Improvement:
+30.77percentage points (about44.4%relative lift vs baseline) - Top-1 root-cause accuracy:
100%(12/12applicable cases) - Agentic proposal validity:
100%(6/6exercised cases) - Guarded validation pass rate:
50%(3/6, three good fixes passed and three bad fixes were blocked) - Artifact hash reproducibility:
100% - Confidence reproducibility:
100%
Benchmark source:
- Reproduce locally:
python scripts/run_benchmark.py --suite fixtures/benchmarks/mvp-suite.json --output-root artifacts/benchmark-mvp --report-json docs/reports/mvp-benchmark-report.json --report-md docs/reports/mvp-benchmark-report.md - Compare local Ollama against the fixture-backed benchmark:
python scripts/run_ollama_comparison.py --suite fixtures/benchmarks/mvp-suite.json --llm-model qwen2.5-coder:7b --report-json artifacts/ollama-comparison/latest.json docs/reports/mvp-benchmark-report.mddocs/reports/mvp-benchmark-report.jsondocs/limitations.md
Primary path (recommended):
- Install the GitHub App for your target repository.
- Configure app runtime with safe defaults:
enabled=truepost_comment=trueenable_pr_mode=falsecreate_fix_pr=false
- Trigger a failed
workflow_runand verify:- RCA comment appears on PR/commit context
ci-rca.jsonandci-rca.mdpaths are returned- Outcome status/reason codes are machine-readable
Setup references:
docs/app-first-mvp.mddocs/app-config-contract.mddocs/app-outcome-codes.mddocs/app-operations.mddocs/migration-action-to-app.md
Reference artifact examples:
artifacts/benchmark-mvp/case-typecheck-ts2345/ci-rca.jsonartifacts/benchmark-mvp/case-typecheck-ts2345/ci-rca.md
Recommended default for new users: deterministic.
| Mode | Autonomy | Key requirement | Cost profile | Risk profile |
|---|---|---|---|---|
deterministic |
Rule-based only | None | Lowest | Lowest |
agentic_assist |
LLM proposes candidate fix steps, deterministic pipeline validates/falls back | Hosted providers require API key; local does not |
Medium | Low-medium |
agentic_full |
Highest autonomy path (explicit opt-in gate required) | Hosted providers require API key; local does not |
Highest | Highest |
Provider support:
- Hosted:
openai,gemini,anthropic(requireprovider_api_keyin agentic modes). - Local:
local(Ollama endpoint compatible, no paid vendor API key required).
Action secret examples:
provider: openai+provider_api_key: ${{ secrets.OPENAI_API_KEY }}provider: gemini+provider_api_key: ${{ secrets.GEMINI_API_KEY }}provider: anthropic+provider_api_key: ${{ secrets.ANTHROPIC_API_KEY }}provider: local+ noprovider_api_key
ci-rootcause analyzes CI failures and produces:
- Structured failure graph
- Deterministic root-cause ranking
- Deterministic confidence score
- Evidence-backed fix plan
- Deterministic patch plan operations (
modify/create/delete/rename) - Optional guarded fix PR (never auto-merged)
ci-rca.jsonandci-rca.mdartifactsci-rca-observability.jsonrun telemetry artifact (trace/timing/failure taxonomy)
Primary runtime target is GitHub Actions. Provider adapter defaults support GitHub Actions and GitLab CI metadata resolution.
Ideal use cases:
- CI failed and you need deterministic root-cause ranking with evidence, not just a generic summary.
- You want machine-readable RCA artifacts (
ci-rca.json) for automation/reporting. - You want safe, guardrailed fix PR proposals with explicit confidence thresholds.
- You need consistent behavior across repeated runs on the same inputs.
Not a fit (non-goals):
- Running arbitrary autonomous repo-wide refactors.
- Replacing your normal test/lint/build workflows.
- Auto-merging remediation changes without human review.
Comparison with formatter-only autofix workflows:
| Capability | ci-rootcause | Formatter/Linter autofix flow |
|---|---|---|
| Works from failed CI logs + diff | Yes | Usually no |
| Root-cause classification | Yes | No |
| Ranked RCA with confidence | Yes | No |
Structured RCA artifact (ci-rca.json) |
Yes | No |
| Guardrailed optional fix PRs | Yes | Yes (tool-dependent) |
| Designed for deterministic replay | Yes | Varies |
flowchart LR
A[CI Logs + Diff] --> B[Log Ingest Agent]
A --> C[Diff Analysis Agent]
B --> D[Failure Classification Agent]
C --> E[Root Cause Ranker Agent]
D --> E
E --> F[Fix Planner Agent]
E --> G[Reporter Agent]
F --> H[PR Creation Agent]
G --> I[Artifacts ci-rca.json + ci-rca.md]
H --> J[Guarded Fix PR]
Requirements:
- Python 3.11+
Install tools:
python -m pip install --upgrade pip
pip install -r requirements.txt
pre-commit installRun checks:
ruff check .
ruff format --check .
pytest- Install dependencies:
pip install -r requirements.txt- Run the local pipeline once:
ci-rootcause \
--log-path fixtures/ci-logs/github-actions-python-failure.log \
--diff-path fixtures/diffs/refactor-only.diff \
--output-dir artifacts \
--timestamp 2026-02-21T00:00:00Z \
--commit abc123 \
--run-id gha_quickstart_1 \
--base-commit abc122 \
--head-commit abc123 \
--repository owner/repo- Inspect generated artifacts:
artifacts/ci-rca.jsonartifacts/ci-rca.md
Run three reproducible demo scenarios:
for case in \
fixtures/demos/01-dependency-lockfile-drift \
fixtures/demos/02-typecheck-ts2345 \
fixtures/demos/03-infra-timeout
do
name="$(basename "$case")"
ci-rootcause \
--log-path "$case/ci.log" \
--diff-path "$case/change.diff" \
--output-dir "artifacts/demo/$name" \
--timestamp 2026-02-21T00:00:00Z \
--commit abc123 \
--run-id "demo_${name}" \
--base-commit abc122 \
--head-commit abc123 \
--repository owner/repo
doneDemo fixture pack:
fixtures/demos/README.mdfixtures/demos/01-dependency-lockfile-driftfixtures/demos/02-typecheck-ts2345fixtures/demos/03-infra-timeout
Run end-to-end deterministic analysis locally:
ci-rootcause \
--log-path fixtures/ci-logs/github-actions-python-failure.log \
--diff-path fixtures/diffs/refactor-only.diff \
--historical-runs-path fixtures/classification/historical-runs.sample.json \
--output-dir artifacts \
--timestamp 2026-02-20T00:00:00Z \
--commit abc123 \
--run-id gha_local_1 \
--base-commit abc122 \
--head-commit abc123 \
--repository owner/repoCLI behavior:
- Writes
ci-rca.jsonandci-rca.mdinto--output-dir - Prints a machine-readable JSON summary to stdout
- Exits
0forcompleted/partialanalysis runs,2for runtime/input errors - Supports optional deterministic flaky-test detection via
--historical-runs-path - Supports local
--config-path(simplekey: value) and single-stream stdin input via- - Supports
--offline-onlyto force no remote PR creation/network calls - Supports rollout profile
--profile safe-github-rollout(enforces min PR confidence >=0.90)
Runtime mode:
- Uses Google ADK runtime orchestration by default when
google-adkis installed - Falls back to deterministic local orchestration if ADK runtime initialization fails
- Uses deterministic local orchestration when
--fail-fastis enabled
Execution order is deterministic and fixed:
log_ingestdiff_analysisfailure_classificationroot_cause_rankerfix_plannerreporterpr_creation
Runtime behavior:
- ADK runtime is used by default when available.
- Deterministic local fallback executes on ADK initialization/runtime failure.
fail_fastuses deterministic local orchestration to preserve exception behavior.
Live PR creation/idempotency validation is available as an opt-in integration test:
scripts/run_live_github_test.sh \
--repo-path /path/to/disposable/repo \
--repository owner/repo \
--token ghp_xxx \
--target-branch mainNotes:
- Test is skipped unless
CI_ROOTCAUSE_LIVE_GITHUB=1. - Use a disposable repository with push + PR permissions.
- Script prints a cleanup checklist after the test run.
- Benchmark report JSON:
docs/reports/mvp-benchmark-report.json - Benchmark report summary:
docs/reports/mvp-benchmark-report.md - Release checklist:
docs/release-checklist-v0.1.1.md - Agentic release plan + thresholds:
docs/agentic-release-plan.md - Benchmark metrics include classification/primary RCA accuracy, confidence reproducibility,
artifact-hash reproducibility, timing distribution (
mean/median/p95), and deterministic lift againstbasic-log-summarizer-v1baseline classification accuracy. - Release notes:
docs/release-notes-v0.1.0.md - Known limitations:
docs/limitations.md
- Current curated benchmark corpus is intentionally small (MVP scope).
- Classification coverage is deterministic-rule based and pattern limited.
- Timing metrics are runtime-derived and marked as nondeterministic metadata.
- Automated fix generation is guardrailed and intentionally conservative.
- No automatic merge or branch-protection bypass is supported.
- No CI rerun orchestration is included in MVP.
Contribution standards are documented in CONTRIBUTING.md.