docs(instructions): ground before skipping — a failing test/example is a signal#33
Conversation
…e skipping Codifies the discipline that was missing when a single Extended Verification run got mistaken for a proven arm64 QEMU limitation and "fixed" by arch-gating the rollback example (held in #32, real grounding tracked in #31). The anti-masking rules already existed but were siloed in testing.instructions.md and framed only around timeout-bumps; the most dangerous masking move — skip / os-gate / arch-gate — was never named, and nothing required reproducing a claimed platform limitation locally (which we can: arm64 CHR runs under TCG on Intel). - examples.instructions.md: examples are canaries; a failing example is a quickchr bug until a LOCAL repro proves otherwise; never gate a failure as the first move; don't record an unproven cause in DESIGN/API docs/BACKLOG/issues/skills. - testing.instructions.md: name skip/gating as the worst masking move (worse than a timeout-bump); "platform limitation" is a hypothesis until reproduced locally; one CI failure must not cascade into doc/skill edits. - ci.instructions.md: one CI run is a signal, not a fact — reproduce locally, don't let one red run cascade. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 26 minutes and 23 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Updates the repository’s contribution instructions to treat failing tests/examples and red CI runs as signals to investigate, explicitly discouraging “masking” fixes (skip / OS-gating / arch-gating) until a local reproduction confirms an unfixable platform limitation.
Changes:
- Adds a new “failing test is a signal” subsection to testing instructions, extending the existing “don’t bump timeouts first” discipline to skips/gates.
- Adds an “examples are canaries” section to examples instructions, framing example failures as potential quickchr bugs until locally reproduced otherwise.
- Adds a top-level CI principle: a single red run is a signal, and should not trigger cascading code/doc/skill edits or masking gates without grounding.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| .github/instructions/testing.instructions.md | Extends anti-masking guidance from timeout bumps to skips/gates; emphasizes local repro before calling something a platform limit. |
| .github/instructions/examples.instructions.md | Establishes examples as canaries; forbids skip/os/arch gating as a first reaction to failures. |
| .github/instructions/ci.instructions.md | Adds a top-of-file principle to treat one CI run as a signal and avoid cascaded, ungrounded changes. |
Why
In the #22 close-out I mistook a single Extended Verification run for a proven QEMU limitation ("snapshots don't work on aarch64") and "fixed" it by
arch-gating therollbackexample — without ever reproducing it locally, even though quickchr runs arm64 CHR under TCG on this Intel host. That masked a real signal: the example was doing its job (a canary catching what unit/integration tests miss) and I silenced it, then spread the unproven cause across DESIGN.md, API docs, BACKLOG, an issue, and nearly into a shared SKILL.That masking PR is on hold (#32, draft); actually grounding the snapshot behavior is #31. This PR fixes the instructions so the class of error is harder to repeat.
What was missing
The anti-masking discipline already existed (
testing.instructions.md: "Never increase a timeout as a first fix. It masks the root cause") — but it was siloed totest/**, framed only around timeout-bumps, and never named the most destructive masking move: skip /os-gate /arch-gate, which removes the signal permanently. Nor did anything require reproducing a claimed platform limitation locally before recording it.Changes (instructions only — no code)
examples.instructions.md— new section: examples are canaries; a failing example is a quickchr bug until a local repro proves otherwise; never gate a failure as the first move; don't write an unproven cause into DESIGN/API docs/BACKLOG/issues/skills.testing.instructions.md— new subsection beside the timeout rule: skip/gating is the worst masking move (a bumped timeout still runs the test; a skip deletes it — and filing an issue alongside doesn't make it safe); "platform limitation" is a hypothesis until reproduced locally (we boot arm64 under TCG here); one CI failure must not cascade.ci.instructions.md— top-of-file principle: one CI run is a signal, not a fact — reproduce locally, don't let one red run cascade.The three cross-reference each other.
bun run checkgreen (cspell, markdownlint).🤖 Generated with Claude Code