Revert #269 — restore main CI to green (was merged red) by ProtocolWarden · Pull Request #271 · ProtocolWarden/OperationsCenter

ProtocolWarden · 2026-06-12T18:36:43Z

Why

#269 ("Add parametrized edge-case tests for extreme metric scenarios") was merged with 4 failing CI checks and has held main's Test (pytest) and Flaky test detection jobs red since 2026-06-12T08:20Z (~5h).

Its ~2,700 lines of tests are unsalvageable as-is:

6 of the 7 per-test metrics it tests don't exist in production — failure_entropy, streak_variance, recovery_time_percentile_90, duration_stability, environment_correlation, isolation_score appear in zero source files. The real FlakyTestMetric has a different set (pattern_entropy, streak_length, duration_variance, …).
The edge-case tests assert hardcoded expected values inconsistent with their own inline formulas — e.g. failure_entropy imbalanced_1_99 expects 0.081296, but the inline Shannon-entropy formula yields 0.080789 (> the test's own 1e-5 tolerance).
conftest.py's factory constructs FlakyTestMetric(failure_entropy=…) / FlakyTestAggregationReport(session_id=…) against models that never had those fields.

There is nothing in production for these tests to exercise, so they cannot be "fixed" — only reverted or rewritten from scratch.

Effect

Restores main to green. Verified locally: tests/unit/observer → 635 passed, 1 skipped, 2 xfailed (was 77 failed + 6 errors).

The flaky edge-case metrics, if wanted, will be implemented as a real feature with validated tests in a follow-up.

🤖 Generated with Claude Code

…#269)" This reverts commit 774bcea. #269 was merged with 4 failing CI checks and has held main's Test (pytest) + Flaky test detection jobs red since 2026-06-12T08:20Z. Its tests target a flaky-metric design that was never implemented — 6 of the 7 per-test metrics (failure_entropy, streak_variance, recovery_time_percentile_90, duration_stability, environment_correlation, isolation_score) exist in no source file — and the edge-case assertions use hardcoded expected values inconsistent with their own inline formulas. There is nothing in production for them to test. Reverting restores main to green. The metrics, if desired, will be implemented as a real feature with validated tests in a separate change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ProtocolWarden · 2026-06-12T18:40:19Z

Self-review concerns — auto-fixing (up to 6 attempts; re-queued if still unresolved):

['Scope ambiguity: Revert removes only test files and documentation. Unimplemented metrics (6 of 7 per-test metrics) remain unaddressed — are they dangling in src/, should they be deferred with a ticket, or removed entirely?', 'Task.md restructuring out of scope: Changes include complete replacement with new WO-1/WO-6 workflow items (PR management, close-with-receipt, orphan detection) unrelated to reverting edge-case tests; suggest scope creep or mixed concerns.', 'Expected value precision discrepancy: Noted formula mismatch (failure_entropy: 0.081296 vs 0.080789) unexplained — is this a fixable floating-point issue or logic error? Revert removes tests rather than resolving.', "Incomplete root-cause documentation: Revert correctly identifies 6 missing metrics but doesn't clarify whether they need implementation as a follow-up, explicit removal from design, or just deferral — leaves architectural issue unresolved.", "CI restoration claim unverifiable: Cannot confirm 'restores main CI to green' without running tests (per instructions)."]

Adds query_flaky.py — lightweight query-result projections (FlakyTest, FlakyTestMetrics, RepositoryHealth) and FlakyTestQueryMixin, mixed into TestSignalQuery so the query API can surface flaky-test data from snapshot signals. Distinct from the detection-subsystem models in flaky_test_models.py (documented in the module docstring to avoid the FlakyTestMetric/FlakyTestMetrics name trap). Review fixes folded in: - RepositoryHealth.flaky_test_percent is a true percentage (flaky_test_count / total_test_count * 100, read from test_signal.test_count, zero-guarded) rather than the raw count it previously stored. - get_test_metrics derives critical_tests from the same deduplicated set as most_problematic, so it can't exceed total_flaky_tests or double-count across snapshots. - +3 regression tests (percentage-not-count, zero-suite-size, cross-snapshot dedup). Rescoped onto the reverted main (#271); the stale edge-case test files are gone. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds query_flaky.py — lightweight query-result projections (FlakyTest, FlakyTestMetrics, RepositoryHealth) and FlakyTestQueryMixin, mixed into TestSignalQuery so the query API can surface flaky-test data from snapshot signals. Distinct from the detection-subsystem models in flaky_test_models.py (documented in the module docstring to avoid the FlakyTestMetric/FlakyTestMetrics name trap). Review fixes folded in: - RepositoryHealth.flaky_test_percent is a true percentage (flaky_test_count / total_test_count * 100, read from test_signal.test_count, zero-guarded) rather than the raw count it previously stored. - get_test_metrics derives critical_tests from the same deduplicated set as most_problematic, so it can't exceed total_flaky_tests or double-count across snapshots. - +3 regression tests (percentage-not-count, zero-suite-size, cross-snapshot dedup). Rescoped onto the reverted main (#271); the stale edge-case test files are gone. Co-authored-by: ProtocolWarden <ProtocolWarden@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

This commit addresses review concerns from PR #271 self-review: 1. Scope Creep (Concern #1 & #2): - Remove WO-1/WO-6 workflow items from task.md (pre-existing on main) - Focus task.md exclusively on PR #269 test revert - Clarify that task restructuring is out-of-scope 2. Unimplemented Metrics Documentation (Concern #1 & #4): - Update FlakyTestMetric docstring to clarify Phase 1 vs Phase 2 metrics - Document 6 deferred metrics with explicit decision rationale - Reference design document and Phase 2 timeline - No orphaned implementations remain 3. Context Files: - Update .console/task.md: Focus on Stage 1 (scope fix) - Update .console/log.md: Add Stage 1 and Stage 2 entries - Add PHASE_2_METRICS_ROADMAP.md: Phase 2 planning document All review concerns remain resolvable through focused code review and CI verification. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

ProtocolWarden merged commit b82b944 into main Jun 12, 2026
17 checks passed

ProtocolWarden deleted the fix/revert-269-green-main branch June 12, 2026 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #269 — restore main CI to green (was merged red)#271

Revert #269 — restore main CI to green (was merged red)#271
ProtocolWarden merged 1 commit into
mainfrom
fix/revert-269-green-main

ProtocolWarden commented Jun 12, 2026

Uh oh!

ProtocolWarden commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ProtocolWarden commented Jun 12, 2026

Why

Effect

Uh oh!

ProtocolWarden commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant