Skip to content

fix(reviewer): require CI *settled* before declaring green (root cause of #269 merging red)#272

Merged
ProtocolWarden merged 1 commit into
mainfrom
fix/reviewer-require-ci-settled
Jun 12, 2026
Merged

fix(reviewer): require CI *settled* before declaring green (root cause of #269 merging red)#272
ProtocolWarden merged 1 commit into
mainfrom
fix/reviewer-require-ci-settled

Conversation

@ProtocolWarden

Copy link
Copy Markdown
Owner

The bug

The reviewer's auto-merge gate declared CI green whenever get_failed_checks returned an empty list. But an empty failure list only means "nothing has failed yet" — a check still queued/in_progress has conclusion=None and is invisible to get_failed_checks, which only looks at terminal conclusions (failure/timed_out/cancelled).

So the sequence that merged #269 red:

  1. PR pushed → CI starts (test jobs take ~2-3 min).
  2. Reviewer polls: get_failed_checks[] (jobs still running, none failed yet).
  3. Gate declares green → runs self-review (~1 min) → LGTMmerges.
  4. The test jobs finish after the merge and fail → main goes red.

#269 merged with 4 failing checks this way and held main's Test (pytest) + Flaky test detection jobs red for ~5 hours.

The fix

  • New GitHubPRClient.get_incomplete_checks — returns names of check runs whose status != "completed" (i.e. queued/in_progress), with the same latest-run dedup and ignored-checks handling as get_failed_checks.
  • All three CI-evaluation sites in the reviewer now require zero failed AND zero pending before treating CI as green:
    • the primary self-review gate (_phase1),
    • the WO-3 CI-green retraction path,
    • the WO-3 no-progress direct-merge path.
  • A new ci_never_settled escalation fires if checks never settle within the existing _MAX_CI_WAIT_CYCLES bound — so a genuinely stuck CI escalates to a human rather than merging on incomplete checks.

This makes "no failure observed" insufficient for merge; CI must have settled (every run terminal) and be passing.

Tests

  • Adapter unit tests for get_incomplete_checks: running/queued flagged, completed & ignored excluded, latest-run dedup beats a stale in-progress run, no-sha / exception fallbacks.
  • Integration test: with get_failed_checks=[] but a check still in progress, the gate defers — no self-review, no merge, ci_wait_cycles advanced.
  • Mock factories default get_incomplete_checks to [] (settled).

All reviewer + adapter suites pass (247 tests). ruff clean.

Note

_merge_and_done itself does not re-verify CI (it's reached only after these gates), so it's transitively protected; a belt-and-suspenders check there could be a future hardening.

🤖 Generated with Claude Code

@ProtocolWarden

ProtocolWarden commented Jun 12, 2026

Copy link
Copy Markdown
Owner Author

Resolved: new push — automated review resumed

Needs human attention (reason=reviewer_backend_unavailable). Left open — not merged (unresolved) and not closed (work preserved).

reviewer process exited with rc=1 for state_key=OperationsCenter-272 (stdout_tail="You've hit your session limit · resets 4:10pm (America/New_York)")

#269)

The auto-merge gate declared CI "green" whenever get_failed_checks returned an
empty list. But an empty failure list only means "nothing has failed yet" — a
check still queued or in_progress has conclusion=None and is invisible to
get_failed_checks. So the reviewer could complete its ~1min self-review and merge
on an LGTM verdict *before* the ~2-3min test jobs finished, turning the base
branch red. This is exactly how #269 merged with 4 failing checks and held main's
CI red for ~5 hours.

Fix:
- New GitHubPRClient.get_incomplete_checks — names of check runs whose status is
  not "completed" (queued/in_progress), with the same dedup-by-latest and
  ignored-checks handling as get_failed_checks.
- All three CI-evaluation sites in the reviewer now require zero failed AND zero
  pending before treating CI as green: the primary self-review gate, the WO-3
  CI-green retraction, and the WO-3 no-progress direct-merge path.
- A new "ci_never_settled" escalation fires if checks never settle within the
  existing _MAX_CI_WAIT_CYCLES bound (never merge on incomplete CI).

Tests: adapter unit tests for get_incomplete_checks (running/queued flagged,
completed/ignored excluded, latest-run dedup, error/no-sha fallbacks) and an
integration test that the gate defers (no self-review, no merge, wait counter
advanced) when checks are still running. Mock factories default
get_incomplete_checks to [] (settled).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ProtocolWarden ProtocolWarden force-pushed the fix/reviewer-require-ci-settled branch from dadb371 to e8f1246 Compare June 12, 2026 20:26
@ProtocolWarden ProtocolWarden merged commit f4327ff into main Jun 12, 2026
17 checks passed
@ProtocolWarden ProtocolWarden deleted the fix/reviewer-require-ci-settled branch June 12, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant