ProtocolWarden · ProtocolWarden · Jun 12, 2026 · Jun 12, 2026
diff --git a/.console/log.md b/.console/log.md
@@ -1,3 +1,14 @@
+## 2026-06-12 — fix(reviewer): require CI *settled* before declaring green (root cause of #269 merging red)
+
+The merge gate declared CI green whenever get_failed_checks returned [] — but that only means
+"nothing has failed yet"; a check still queued/in_progress has conclusion=None and is invisible to
+get_failed_checks. So the reviewer could self-review (~1min) and merge on LGTM before the ~2-3min
+test jobs finished, turning main red. This is how #269 merged with 4 red checks and held main red ~5h.
+Fix: new GitHubPRClient.get_incomplete_checks (status != "completed"); all three CI-evaluation sites
+(primary self-review gate, WO-3 retraction, WO-3 no-progress direct-merge) now require zero failed
+AND zero pending before proceeding. New "ci_never_settled" escalation if checks never settle within
+the existing wait bound. +tests (adapter + gate defers-on-pending) + mock defaults updated.
+
 ## 2026-06-12 — #270 rescoped to the query layer (clean on reverted main)
 
 After reverting #269 (b82b944d), #270 is rebuilt as green-main + the genuinely-new flaky-test
@@ -1978,197 +1989,5 @@ corrected the stale "five watcher lanes" wording to the actual set
 pipeline lanes; `tools/loop/controller.py` (loop-*) = the separate external
 dev-loop controller. They start/stop independently; full pause needs both.
 
-## 2026-06-04 — Reconcile `.console/` (reconcile/console branch)
-
-Ran the `.console/` reconciliation pass (PlatformManifest console-reconciliation-spec).
-Authored `.console/reconcile.yaml` (untracked) classifying every backlog item as
-done/partial/incomplete with an owner; cross-repo rows route to CxRP / SwitchBoard /
-Warehouse / PlatformManifest / a private downstream repo / Custodian. Filled doc
-homes for every owned done item so `cl reconcile check` is GREEN with zero DOC GAPs.
-Scrubbed the remaining scrub-target names from tracked `docs/` (genericized to a
-private downstream repo; numbered detector IDs left intact). Ran
-`cl reconcile prune --apply`: completed log+backlog history moved to the private
-archive, source trimmed to active sections + recent-N + an archive pointer
-(log 3144→132, backlog 622→368 lines). A second `--apply` is a no-op. Flipped
-`audit.reconcile_enforce: true` in `.custodian/config.yaml`. Tracked `.console/` +
-`docs/` are now scrub-target clean (R2 / boundary I2).
-
-## 2026-06-03 — Reapply OC-venv ruff fallback lost in PR #236 merge
-
-Root cause: PR #236 (coverage 95.75% → 90% gate) overwrote commit 554b55bd which
-added the three-tier ruff lookup (target venv → system PATH → OC root .venv/bin/ruff).
-Without it, _phase0_ci_fix falls back to bare "ruff" causing FileNotFoundError for
-repos without their own ruff binary (e.g. PlatformManifest). Re-applied on
-oc-watchdog/20260603-0647-reapply-ruff-fallback.
-
-Also this cycle: resolved PR #235 merge conflict + custodian T4/T8 violations
-(goal/ba5d9a46) to unblock OPEN_PR_GATE holding task #192.
-
-## 2026-06-02 — Reviewer: CI-green is a precondition, not an auto-merge (operator-directed)
-
-**Status**: ✅ Implemented on `feat/ci-green-requires-lgtm`. Closes the bypass left
-by the verdict-gate work (#224): every managed repo has
-`auto_merge_on_ci_green: true`, which merged autonomy PRs the instant CI was
-green — *before* the new verdict gate ran. Green CI ≠ complete (missing docs etc.
-pass CI), so PRs could still ship half-finished.
-
-**Change** (`pr_review_watcher/main.py _phase1` fast path): CI-green is now a
-PRECONDITION. While CI is red the PR defers (no expensive self-review). Once CI
-is green it falls through to the verdict-gated self-review — LGTM is still the
-only merge path. Stale `operations_center.example.yaml` reviewer docs updated
-(removed human-review phase, surfaced `max_fix_attempts`, documented the
-precondition). Tests: ci-green-requires-LGTM + ci-red-defers-without-review.
-108 passed; ruff clean.
-
----
-
-## 2026-06-02 — Probe-and-clear for stale worker-backend cooldowns
-
-Worker-backend cooldowns carry an *estimated* `reset_at` and were never retracted
-on their own — only expiring when `reset_at` passed. When a limit lifted early
-(e.g. sonnet recovered before its guessed weekly reset), the cooldown lingered:
-status surfaces showed the model cooling, and when every model looked cooling the
-board_unblock gate deferred dispatch for no reason.
-
-Added a probe-and-clear path:
-- `UsageStore.clear_worker_backend_cooldown(worker_backend, model, ..., include_account_wide)`
-  retracts a model's active `model_weekly` cooldown (and, on request, account-wide
-  cooldowns — one model running disproves an all-models block); appends a
-  `worker_backend_cooldown_cleared` audit event.
-- `backends/worker_backend_probe.py` — `probe_model` runs a cheap `claude -p`/`codex
-  exec` against a model (mirrors the controller's invocation); `ok` only on exit 0
-  with no limit signal. `refresh_cooldowns` probes each *cooling* model and clears
-  the ones proven runnable. Probes never record cooldowns — a flaky probe can only
-  fail to clear, never falsely block.
-- New entrypoint `operations-center-worker-backend-probe` + `worker-backend-probe`
-  subcommand (safe to run on a schedule / cron).
-- Wired as a self-heal into `board_unblock._dispatch_cooldown_reason`: when every
-  allowed backend looks cooling, probe + re-read before deferring — turning a
-  would-be stale-cooldown deadlock into a self-heal. Injected for offline tests.
-
-Plus three hardening fixes:
-- Periodic self-heal: the watchdog hourly loop now runs `worker-backend-probe`
-  (--timeout 30) so stale cooldowns clear even when the board is idle (no-op when
-  nothing is cooling).
-- `record_worker_backend_cooldown` coalesces duplicates — drops any still-active
-  cooldown for the same (worker_backend, limit_kind, model) before appending, so
-  re-recording the same limit each cycle no longer piles up identical events
-  (observed: 12 identical sonnet rows).
-- The board_unblock gate bounds its probe to `_GATE_PROBE_TIMEOUT_SECONDS` (20s)
-  so a hung probe can't stall a board cycle; the standalone CLI/cron keeps the
-  90s default.
-
-Tests: clear primitive (per-model / account-wide / no-op), dedup-on-record,
-probe module (fake runner: ok/limit-signal/nonzero/timeout; refresh clears only
-runnable models; account-wide cleared on first success; no-op when nothing
-cooling), CLI smoke, and the board_unblock self-heal. Verified end-to-end against
-the live claude CLI.
-
-## 2026-05-30 — controller: make opus fallback reachable
-
-_backend_available checked _command_available(backend) with the raw name, so _command_available("opus") always failed (opus has no binary; it uses the claude CLI). The sonnet→opus→codex fallback was therefore dead code — opus could never be selected. Resolve the cli ("claude" for opus) so opus is reachable. Also repaired 3 parse_rate_limit_reset tests left broken by the earlier (reset, log_text) tuple-return change and added opus/priority/global-limit selection tests. 15 passed.
-
----
-
-## 2026-05-28 — P6 follow-up: fixed 10 pre-existing ty errors exposed by ty==0.0.40 pin
-
-## 2026-05-28 — Operator: work order 0009 — execution hygiene
-
-6 execution quality problems documented and assigned. See ADR 0009.
-P1/P5: stop polluting .console/ truth files; P2: delete STAGE_*.md; P3: open-PR gate;
-P4: squash stage commits; P6: pin tool versions.
-
----
-
-## 2026-05-28 — Operator: re-rebase PR #180 onto new main (post #181 merge)
-
-Resolved conftest.py conflict: took PR #180 tmp_path refactor, ruff auto-fixed unused import.
-All 3609 tests pass.
-
----
-
-## 2026-05-28 — Loop controller: robustly resolve `cl` (CL_HOME fallback)
-
-The loop controller resolved `claude`/`codex` robustly via `_resolve_command`
-(PATH + `~/.local/bin` fallbacks) but invoked `cl` as a bare `["cl", ...]`,
-relying solely on PATH. That works when the loop is launched `nohup` from an
-interactive shell (whose `~/.bashrc` puts `$CL_HOME/bin` on PATH) but fails
-silently under cron/systemd/clean shells — `cl` not found → no anchor → loop
-runs unanchored → ContextGuard blocks claude. Mirrors the OperatorConsole pane
-bug just fixed.
-
-Added a `cl` branch to `_fallback_command_candidates` (uses `CL_HOME`) and
-routed all four `cl` calls (session start/end, hydrate, capture) through
-`_resolve_command`. Verified: with `cl` off PATH but `CL_HOME` set, the
-controller resolves it and anchors at PlatformManifest.
-
-## 2026-05-25
-
-- Fixed the pre-existing repo-wide pytest collection blocker by renaming the duplicate hardening module to `tests/observer/test_collectors_hardening/test_execution_health_hardening.py`, avoiding the `test_execution_health` import collision.
-- Restored observer test consistency around dependency drift and execution health artifacts:
-  - `ExecutionOutcomeValidator` now accepts the retained artifact statuses `no_op` and `error` in addition to `executed`, `failed`, `timeout`, and `unknown`.
-  - `DependencyDriftCollector` now returns `not_available` consistently so `ObservationCoverageDeriver` can detect persistent missing coverage correctly.
-- Fixed malformed-payload alert handling to normalize naive timestamps to UTC before lookback comparisons in `observer/security_logging.py`.
-- Added OC→CxRP backend normalization in `contracts/cxrp_mapper.py` so OC executor backends like `team_executor`, `dag_executor`, and `critique_executor` serialize onto the current CxRP backend enum without failing mapper tests.
-- Validation:
-  - `python -m pytest` → `3536 passed, 7 skipped`
-  - `python -m pytest -m integration` → `3 passed`
-
-## 2026-05-25
-
-- Added executor worker-backend observability end to end: the `team_executor`, `dag_executor`, and `critique_executor` adapters now expose `execute_and_capture()` with `observed_runtime` showing preferred backend, selected backend, fallback usage, and backend cooldown snapshot.
-- Added a live operator status surface for worker-backend cooldowns via `operations-center-worker-backend-status` and `./scripts/operations-center.sh worker-backend-status`, backed by a new `UsageStore.current_worker_backend_cooldowns()` summary API.
-- Extended retained trace visibility so `operations-center-run-show <run_id>` prints the `Observed runtime` block, making actual `claude_code` vs `codex_cli` selection visible per run without re-reading raw record metadata.
-- Validation: focused pytest slices passed (`68 passed`) and targeted Ruff checks passed. Repo-wide `python -m pytest` and `python -m pytest -m integration` are still blocked by the pre-existing duplicate-module import mismatch between `tests/test_execution_health.py` and `tests/observer/test_collectors_hardening/test_execution_health.py`.
-
-## Archived
-
-_Archived completed history → `/home/dev/Documents/GitHub/PrivateManifest/archive/console/OperationsCenter/log-2026-06-04.md`_
-
-
-## 2026-06-08 — Review goal-text: explicit read-only constraint
-
-Added "TASK TYPE: Read-only code review / SINGLE REQUIRED ACTION: Write verdict.json" 
-header to review goal_text. Root cause: budget team coordinator (Haiku effort=low) was
-decomposing the review task into implementation sub-stages that tried to modify source
-files rather than just writing verdict.json. PR #253 had 7 consecutive no_verdict failures.
-New phrasing prevents the coordinator from creating non-verdict-writing stages.
-Also cleared PR #253 escalation for one more retry cycle.
-
-## 2026-06-08 — fix(tests): loosen snapshot performance timing bounds
-
-Flaky CI failure: 0.1s limit failed with 0.177s on shared runners.
-Raised to 1.0s — still catches catastrophic regression (10x+).
-
-## 2026-06-08 — WO-1 close-with-receipt invariant hardened
-
-## 2026-06-08 — fix(controller): persisted Claude cooldowns fall through to Codex
-
-Loop controller now seeds Sonnet/Opus/Codex cooldowns from the persisted usage
-ledger on restart and reselects after chained backend limits, so exhausted
-Claude weekly quotas fall through to Codex instead of sleeping until reset.
-
-## 2026-06-08 — fix(controller): Claude weekly cooldown is account-wide
-
-Bare Claude Code weekly-limit messages now classify as `global_weekly` and cool
-both Claude controller lanes so status surfaces do not leave Haiku looking runnable.
-
-Controller startup also normalizes matching persisted Sonnet+Opus weekly resets
-to account-wide metadata so `loop_controller_state.json` reports the same scope.
-
-## 2026-06-10 — fix(reviewer): make no-progress detection reliable + preserve external escalation
-
-Root cause: no-progress check required AI concern summaries to match exactly (text comparison),
-but LLM output varies. Also: TOCTOU race where reviewer overwrote watchdog's escalation after
-fix pass. Fixed both; 88 reviewer tests pass.
-
-## 2026-06-10 — fix(tests): use dynamic dates in flaky storage cleanup tests
-
-Hardcoded 2026-06-07 "recent" date fell behind the 3-day retention window causing CI failures.
-
-## 2026-06-12 — fix(observer): restore ty: ignore suppression for boto3/requests
 
-Commit 5f763c99 updated mypy error codes on TYPE_CHECKING-guarded imports in
-snapshot_repository.py but dropped the ty-specific `# ty: ignore[unresolved-import]`
-comments. The ty CI check then failed with unresolved-import on lines 24–25.
-Restored both suppression annotations so mypy and ty both pass.
+<!-- log GC: 20 oldest entries trimmed to keep .console/log.md under the 100KB R2 budget; full history in git. -->
diff --git a/src/operations_center/adapters/github_pr.py b/src/operations_center/adapters/github_pr.py
@@ -255,6 +255,52 @@ def get_failed_checks(
                 failed.append(f"{name}: {summary}")
         return failed
 
+    def get_incomplete_checks(
+        self,
+        owner: str,
+        repo: str,
+        pr_number: int,
+        *,
+        pr_data: dict | None = None,
+        ignored_checks: list[str] | None = None,
+    ) -> list[str]:
+        """Return names of checks not yet in a terminal state for the PR head.
+
+        A check is "incomplete" when its ``status`` is anything other than
+        ``completed`` (e.g. ``queued`` / ``in_progress``) — such a run has no
+        ``conclusion`` yet, so :meth:`get_failed_checks` cannot see it.
+
+        Callers gating a merge on green CI MUST treat a non-empty result as
+        "not green yet": an empty failure list means only "nothing has failed
+        *so far*", which is true while CI is still running. Declaring green in
+        that window is how a PR can be merged before its tests finish and then
+        turn the base branch red.
+        """
+        if pr_data is None:
+            pr_data = self.get_pr(owner, repo, pr_number)
+        head_sha = (pr_data.get("head") or {}).get("sha", "")
+        if not head_sha:
+            return []
+        try:
+            check_runs = self.get_check_runs(owner, repo, head_sha)
+        except Exception:
+            return []
+        ignored = [s.lower() for s in (ignored_checks or [])]
+        # Dedupe by name (keep the newest run) — same rationale as get_failed_checks.
+        latest: dict[str, dict] = {}
+        for cr in check_runs:
+            name = cr.get("name", "unknown")
+            if cr.get("id", 0) > latest.get(name, {}).get("id", 0):
+                latest[name] = cr
+        pending = []
+        for cr in latest.values():
+            if cr.get("status") != "completed":
+                name = cr.get("name", "unknown")
+                if ignored and any(pat in name.lower() for pat in ignored):
+                    continue
+                pending.append(name)
+        return pending
+
     def list_open_prs(self, owner: str, repo: str) -> list[dict]:
         resp = self._request(
             "GET",