From a maxiter loop exit (8/8) where all ACs were ultimately met but COMPLETE was never emitted. Methodology-only; fully sanitized (no project-specific information).
Context
A sequential-plan task ran the full 8-round budget and exited on maxiter, not COMPLETE, even though every acceptance criterion was satisfied. Execution was sound: accurate reviews, no false positives, no drift, no genuine stagnation. The failure to emit COMPLETE was structural to the methodology, not a performance problem.
Suggestions
-
Split the round verdict from the objective verdict. Early rounds received NOT COMPLETE only because downstream plan stages could not start yet (strict dependency chain). The reviewer's own "ADVANCED" line was the honest signal. Promote a first-class ROUND-COMPLETE (this round's contracted scope is fully/correctly done) distinct from objective IN-PROGRESS; reserve terminal COMPLETE for the objective.
-
Close the verification off-by-one. The rhythm "done pending verification" this round → "verified" next round means the final round of work can never be certified before the budget ends, so a budget that ends when the work ends cannot end in COMPLETE. Allow same-pass verify-and-close when a completion request has no new mainline gaps, or reserve the final iteration as a dedicated close-out round.
-
Front-load per-artifact quality checklists. One proof artifact took ~4 rounds because each round met the stated bar and the reviewer then raised a new, legitimate, previously-unstated bar (labeling → reproducibility → exit-gating → mechanical derivation). Ship standardized review rubrics keyed to artifact type so the full bar is cleared in one round.
-
Pre-completion full-sweep audit at the first COMPLETE request. The last ~4 rounds each resolved a single independent shrinking defect (down to one false sentence in a provenance file). Trigger an exhaustive audit of all artifacts against the consistency checklist at the first completion request and return the complete defect list at once, instead of one-defect-per-round discovery. (A mid-loop "full review round" is less valuable than one positioned at the first completion request.)
-
Break-glass for corrupt locked artifacts + validate at init. A setup tool truncated the immutable, authoritative goal section at initialization; a guardrail forbade both the doer and the reviewer from repairing it for all 8 rounds, so it generated a recurring reminder with no resolution. Validate the integrity of auto-generated immutable artifacts before locking them, and provide a sanctioned, logged repair path when both parties agree the locked content is tool damage rather than a goal change.
-
Convert defaulted external decisions out of the active list. A few decisions needing a human answer were given conservative defaults and execution proceeded, but they were re-surfaced as "PENDING" every round as boilerplate noise. Allow converting them to "ASSUMED (default X), revisit only if answered" and move them to a separate assumptions ledger.
What to preserve
Review quality was the session's strongest feature — specific, evidence-cited feedback with essentially no false positives, catching at least two easy-to-miss substantive defects. These suggestions target completion semantics and review cadence/batching, not review correctness; the latter should be left untouched.
From a maxiter loop exit (8/8) where all ACs were ultimately met but COMPLETE was never emitted. Methodology-only; fully sanitized (no project-specific information).
Context
A sequential-plan task ran the full 8-round budget and exited on maxiter, not COMPLETE, even though every acceptance criterion was satisfied. Execution was sound: accurate reviews, no false positives, no drift, no genuine stagnation. The failure to emit COMPLETE was structural to the methodology, not a performance problem.
Suggestions
Split the round verdict from the objective verdict. Early rounds received NOT COMPLETE only because downstream plan stages could not start yet (strict dependency chain). The reviewer's own "ADVANCED" line was the honest signal. Promote a first-class ROUND-COMPLETE (this round's contracted scope is fully/correctly done) distinct from objective IN-PROGRESS; reserve terminal COMPLETE for the objective.
Close the verification off-by-one. The rhythm "done pending verification" this round → "verified" next round means the final round of work can never be certified before the budget ends, so a budget that ends when the work ends cannot end in COMPLETE. Allow same-pass verify-and-close when a completion request has no new mainline gaps, or reserve the final iteration as a dedicated close-out round.
Front-load per-artifact quality checklists. One proof artifact took ~4 rounds because each round met the stated bar and the reviewer then raised a new, legitimate, previously-unstated bar (labeling → reproducibility → exit-gating → mechanical derivation). Ship standardized review rubrics keyed to artifact type so the full bar is cleared in one round.
Pre-completion full-sweep audit at the first COMPLETE request. The last ~4 rounds each resolved a single independent shrinking defect (down to one false sentence in a provenance file). Trigger an exhaustive audit of all artifacts against the consistency checklist at the first completion request and return the complete defect list at once, instead of one-defect-per-round discovery. (A mid-loop "full review round" is less valuable than one positioned at the first completion request.)
Break-glass for corrupt locked artifacts + validate at init. A setup tool truncated the immutable, authoritative goal section at initialization; a guardrail forbade both the doer and the reviewer from repairing it for all 8 rounds, so it generated a recurring reminder with no resolution. Validate the integrity of auto-generated immutable artifacts before locking them, and provide a sanctioned, logged repair path when both parties agree the locked content is tool damage rather than a goal change.
Convert defaulted external decisions out of the active list. A few decisions needing a human answer were given conservative defaults and execution proceeded, but they were re-surfaced as "PENDING" every round as boilerplate noise. Allow converting them to "ASSUMED (default X), revisit only if answered" and move them to a separate assumptions ledger.
What to preserve
Review quality was the session's strongest feature — specific, evidence-cited feedback with essentially no false positives, catching at least two easy-to-miss substantive defects. These suggestions target completion semantics and review cadence/batching, not review correctness; the latter should be left untouched.