fix(reviewer): require CI *settled* before declaring green (root cause of #269 merging red)#272
Merged
Merged
Conversation
Owner
Author
reviewer process exited with rc=1 for state_key=OperationsCenter-272 (stdout_tail="You've hit your session limit · resets 4:10pm (America/New_York)") |
#269) The auto-merge gate declared CI "green" whenever get_failed_checks returned an empty list. But an empty failure list only means "nothing has failed yet" — a check still queued or in_progress has conclusion=None and is invisible to get_failed_checks. So the reviewer could complete its ~1min self-review and merge on an LGTM verdict *before* the ~2-3min test jobs finished, turning the base branch red. This is exactly how #269 merged with 4 failing checks and held main's CI red for ~5 hours. Fix: - New GitHubPRClient.get_incomplete_checks — names of check runs whose status is not "completed" (queued/in_progress), with the same dedup-by-latest and ignored-checks handling as get_failed_checks. - All three CI-evaluation sites in the reviewer now require zero failed AND zero pending before treating CI as green: the primary self-review gate, the WO-3 CI-green retraction, and the WO-3 no-progress direct-merge path. - A new "ci_never_settled" escalation fires if checks never settle within the existing _MAX_CI_WAIT_CYCLES bound (never merge on incomplete CI). Tests: adapter unit tests for get_incomplete_checks (running/queued flagged, completed/ignored excluded, latest-run dedup, error/no-sha fallbacks) and an integration test that the gate defers (no self-review, no merge, wait counter advanced) when checks are still running. Mock factories default get_incomplete_checks to [] (settled). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dadb371 to
e8f1246
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
The reviewer's auto-merge gate declared CI green whenever
get_failed_checksreturned an empty list. But an empty failure list only means "nothing has failed yet" — a check stillqueued/in_progresshasconclusion=Noneand is invisible toget_failed_checks, which only looks at terminal conclusions (failure/timed_out/cancelled).So the sequence that merged #269 red:
get_failed_checks→[](jobs still running, none failed yet).#269 merged with 4 failing checks this way and held main's
Test (pytest)+Flaky test detectionjobs red for ~5 hours.The fix
GitHubPRClient.get_incomplete_checks— returns names of check runs whosestatus != "completed"(i.e.queued/in_progress), with the same latest-run dedup and ignored-checks handling asget_failed_checks._phase1),ci_never_settledescalation fires if checks never settle within the existing_MAX_CI_WAIT_CYCLESbound — so a genuinely stuck CI escalates to a human rather than merging on incomplete checks.This makes "no failure observed" insufficient for merge; CI must have settled (every run terminal) and be passing.
Tests
get_incomplete_checks: running/queued flagged, completed & ignored excluded, latest-run dedup beats a stale in-progress run, no-sha / exception fallbacks.get_failed_checks=[]but a check still in progress, the gate defers — no self-review, no merge,ci_wait_cyclesadvanced.get_incomplete_checksto[](settled).All reviewer + adapter suites pass (247 tests).
ruffclean.Note
_merge_and_doneitself does not re-verify CI (it's reached only after these gates), so it's transitively protected; a belt-and-suspenders check there could be a future hardening.🤖 Generated with Claude Code