chore: AI-check Exclusions tightening — 2-week advisory review#139
Open
rafacm wants to merge 1 commit into
Open
chore: AI-check Exclusions tightening — 2-week advisory review#139rafacm wants to merge 1 commit into
rafacm wants to merge 1 commit into
Conversation
AI Checks summary❌ 1 fail · ✅ 4 pass · ⏭️ 0 skip ❌ Feature PR Documentation Bundle
The PR does not contain the required documentation files for a feature bundle. DetailsMissing documentation files: There are no new files added at Other checks (4 passing · 0 skipped)Show details
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Two-week post-merge review of the nine AI checks introduced in PR #118 (closes #117). The checks were migrated from
.continue/checks/to.ai-checks/and wired to GitHub Actions in PR #122 (2026-05-01); path-scoping was added in PR #130.Data window: PRs #122–#136, 2026-05-01 → 2026-05-14. Excludes PR #127 (the deliberate planted-violations validation run). Six real PRs carried check results.
Per-check signal table
vector_store.py)* Feature PR Documentation Bundle PR #134 fire: technically correct (docs were absent), but revealed a gap in the Exclusions — CI/CD tooling PRs were not listed. After this PR's fix the same diff would pass.
False-positive patterns and Exclusions added
1. Branching & PR Strategy — 2 FPs (PRs #129, #133)
Two distinct hallucination patterns:
Pattern A — GHA synthetic merge commit (PR #129):
actions/checkout@v4checks outrefs/pull/N/merge, a synthetic two-parent commit GitHub creates for CI. Its message isMerge <sha>... into <sha>.... The driver sees this merge-shaped HEAD and concludes "branch not rebased" — on every clean rebase. Tracked in #125 (runner fix: pin checkout to${{ github.event.pull_request.head.sha }}). The Exclusion added here teaches the LLM to ignore that specific pattern until the runner fix lands.Pattern B — invented source branch (PR #133): The PR came from
rafacm/download-show-name-fix; the model stated "PR is frommain" with no diff evidence. Added: "Do not infer the source branch from title, description, or diff — read it from explicit metadata."2. Comment Discipline — 1 FP (PR #131)
The model flagged a comment that appeared in the diff as a
-(deleted) line — code this PR was removing. Added: "Lines with a-prefix are deleted. Do not flag them."It also flagged
# Decorators must register before DBOS.launch(); enqueues against late-registered workflows silently fail.as a WHAT-comment. It's a load-bearing WHY comment (silent failure mode). The existing "Keep load-bearing WHY comments" section covers this; the deleted-lines rule is the actual new guard.3. RAGTIME_* Env Var Sync — 1 FP (PR #133)
The model invented
RAGTIME_SHOW_NAMEby tokenisingEpisode.show_name(a Django model field) as if it were an env var name. The diff contained@override_settings(RAGTIME_PODCAST_AGGREGATORS="")in a test — the onlyRAGTIME_*reference, and a pre-existing var. Added: "Only flagos.getenv(...)/os.environ/settings.RAGTIME_*reads — not model field or attribute names."4. Pipeline Step Documentation Sync — 1 FP (PR #133)
Two-layer hallucination: (a) claimed
doc/README.mdwas not updated — it was, at lines 39 and 55; (b) stretched "changing a pipeline step" to mean "any change that touches a file used by a step." AGENTS.md's rule is about the step list changing, not behaviour inside a step. Added: "Changes inside an existing step that don't alterPIPELINE_STEPSor the@DBOS.step()list are not pipeline structural changes." Also clarified thatdoc/README.mdper-behaviour updates are compliant even ifREADME.md's summary table is unchanged.5. Feature PR Documentation Bundle — 1 fire (PR #134)
The check correctly classified PR #134 (sticky PR comment feature) as a significant change missing its documentation bundle. The author manually overrode it ("CI/CD, not the app"). The rule's Exclusions had no entry for CI/CD tooling PRs. Added: "Changes confined to
.github/,.ai-checks/, or similar non-runtime paths are chore/tooling — no bundle needed. Precedent: PRs #118, #122, #134."Zero-fire checks — notes
These four checks fired zero times on real PRs. They are path-scoped (filtered at matrix emission when their
paths:globs don't match the diff) or their domain simply wasn't touched in this window. None are candidates for removal — their planted violations in PR #127 all fired correctly (with the one exception below).manage.py runserverfor achat/views.pychange. The check passed. This is a false-negative risk — the check relies on evidence in the PR description / test plan text, which it may fail to locate when the test plan is embedded in a checklist rather than a prose paragraph. No Exclusion change is needed here (it's an under-firing issue, not over-firing), but flagged for awareness. May need aWhat to Checkprompt refinement in a follow-up.Recommend promote-to-required
After this PR lands:
vector_store.pyPR is next.Defer to required until tightened Exclusions are validated on the next PR batch:
Hallucination tracking
Three distinct driver hallucination patterns from this window are tracked in #137 alongside the underlying runner issue (#125). This PR addresses the rule-side mitigations; driver hardening (system-prompt improvements) is deferred to #137.
Files changed
.ai-checks/branching-and-pr-strategy.md.ai-checks/comment-discipline.md.ai-checks/env-var-sync.md.ai-checks/pipeline-step-sync.md.ai-checks/feature-pr-docs.mdReferences: issue #117, PR #118 (original checks), PR #122 (migration to
.ai-checks/+ GHA runner), PR #130 (path-scoping), PR #134 (sticky comments), #125 (runner fix), #137 (driver hardening).Generated by Claude Code