Skip to content

chore: AI-check Exclusions tightening — 2-week advisory review#139

Open
rafacm wants to merge 1 commit into
mainfrom
chore/continue-checks-2week-review
Open

chore: AI-check Exclusions tightening — 2-week advisory review#139
rafacm wants to merge 1 commit into
mainfrom
chore/continue-checks-2week-review

Conversation

@rafacm
Copy link
Copy Markdown
Owner

@rafacm rafacm commented May 14, 2026

Overview

Two-week post-merge review of the nine AI checks introduced in PR #118 (closes #117). The checks were migrated from .continue/checks/ to .ai-checks/ and wired to GitHub Actions in PR #122 (2026-05-01); path-scoping was added in PR #130.

Data window: PRs #122#136, 2026-05-01 → 2026-05-14. Excludes PR #127 (the deliberate planted-violations validation run). Six real PRs carried check results.

Note for reviewer: The check files live in .ai-checks/, not .continue/checks/ — they were migrated verbatim by PR #122.


Per-check signal table

Check Real fires TP FP FP% Recommendation
Branching & PR Strategy 2 (#129, #133) 0 2 100% Tighten ✓ (this PR)
Comment Discipline 1 (#131) 0 1 100% Tighten ✓ (this PR)
RAGTIME_* Env Var Sync 1 (#133) 0 1 100% Tighten ✓ (this PR)
Pipeline Step Documentation Sync 1 (#133) 0 1 100% Tighten ✓ (this PR)
Feature PR Documentation Bundle 1 (#134) 1* 0 0% Tighten ✓ (this PR) + promote-to-required candidate
Slim Qdrant Payload Discipline 0 Keep advisory (path-scoped; zero real PRs hit vector_store.py)
Entity Creation Race Safety 0 Keep advisory (path-scoped; ran and passed on PRs #131, #133)
gh api Shell Escaping & Endpoints 0 Keep advisory (semantic; correctly passed on all 6 PRs)
ASGI vs WSGI Awareness for Scott 0 Keep advisory ⚠️ (path-scoped; missed planted violation in #127 — false-negative risk noted)

* Feature PR Documentation Bundle PR #134 fire: technically correct (docs were absent), but revealed a gap in the Exclusions — CI/CD tooling PRs were not listed. After this PR's fix the same diff would pass.


False-positive patterns and Exclusions added

1. Branching & PR Strategy — 2 FPs (PRs #129, #133)

Two distinct hallucination patterns:

Pattern A — GHA synthetic merge commit (PR #129): actions/checkout@v4 checks out refs/pull/N/merge, a synthetic two-parent commit GitHub creates for CI. Its message is Merge <sha>... into <sha>.... The driver sees this merge-shaped HEAD and concludes "branch not rebased" — on every clean rebase. Tracked in #125 (runner fix: pin checkout to ${{ github.event.pull_request.head.sha }}). The Exclusion added here teaches the LLM to ignore that specific pattern until the runner fix lands.

Pattern B — invented source branch (PR #133): The PR came from rafacm/download-show-name-fix; the model stated "PR is from main" with no diff evidence. Added: "Do not infer the source branch from title, description, or diff — read it from explicit metadata."

2. Comment Discipline — 1 FP (PR #131)

The model flagged a comment that appeared in the diff as a - (deleted) line — code this PR was removing. Added: "Lines with a - prefix are deleted. Do not flag them."

It also flagged # Decorators must register before DBOS.launch(); enqueues against late-registered workflows silently fail. as a WHAT-comment. It's a load-bearing WHY comment (silent failure mode). The existing "Keep load-bearing WHY comments" section covers this; the deleted-lines rule is the actual new guard.

3. RAGTIME_* Env Var Sync — 1 FP (PR #133)

The model invented RAGTIME_SHOW_NAME by tokenising Episode.show_name (a Django model field) as if it were an env var name. The diff contained @override_settings(RAGTIME_PODCAST_AGGREGATORS="") in a test — the only RAGTIME_* reference, and a pre-existing var. Added: "Only flag os.getenv(...) / os.environ / settings.RAGTIME_* reads — not model field or attribute names."

4. Pipeline Step Documentation Sync — 1 FP (PR #133)

Two-layer hallucination: (a) claimed doc/README.md was not updated — it was, at lines 39 and 55; (b) stretched "changing a pipeline step" to mean "any change that touches a file used by a step." AGENTS.md's rule is about the step list changing, not behaviour inside a step. Added: "Changes inside an existing step that don't alter PIPELINE_STEPS or the @DBOS.step() list are not pipeline structural changes." Also clarified that doc/README.md per-behaviour updates are compliant even if README.md's summary table is unchanged.

5. Feature PR Documentation Bundle — 1 fire (PR #134)

The check correctly classified PR #134 (sticky PR comment feature) as a significant change missing its documentation bundle. The author manually overrode it ("CI/CD, not the app"). The rule's Exclusions had no entry for CI/CD tooling PRs. Added: "Changes confined to .github/, .ai-checks/, or similar non-runtime paths are chore/tooling — no bundle needed. Precedent: PRs #118, #122, #134."


Zero-fire checks — notes

These four checks fired zero times on real PRs. They are path-scoped (filtered at matrix emission when their paths: globs don't match the diff) or their domain simply wasn't touched in this window. None are candidates for removal — their planted violations in PR #127 all fired correctly (with the one exception below).

⚠️ ASGI vs WSGI Awareness for Scott — missed planted violation in #127. The planted violation was a test plan that mentioned only manage.py runserver for a chat/views.py change. The check passed. This is a false-negative risk — the check relies on evidence in the PR description / test plan text, which it may fail to locate when the test plan is embedded in a checklist rather than a prose paragraph. No Exclusion change is needed here (it's an under-firing issue, not over-firing), but flagged for awareness. May need a What to Check prompt refinement in a follow-up.


Recommend promote-to-required

After this PR lands:

  • Feature PR Documentation Bundle — high signal, low noise, correct classification on all real PRs. Exclusions now cover CI/CD tooling. Ready to require.
  • Slim Qdrant Payload Discipline — zero real fires but correctly caught planted violation; path-scoped so won't block unrelated PRs. Promotable when a vector_store.py PR is next.
  • Entity Creation Race Safety — same as above; path-scoped.

Defer to required until tightened Exclusions are validated on the next PR batch:

  • Branching & PR Strategy
  • Comment Discipline
  • RAGTIME_* Env Var Sync
  • Pipeline Step Documentation Sync
  • ASGI vs WSGI Awareness for Scott (pending false-negative investigation)

Hallucination tracking

Three distinct driver hallucination patterns from this window are tracked in #137 alongside the underlying runner issue (#125). This PR addresses the rule-side mitigations; driver hardening (system-prompt improvements) is deferred to #137.


Files changed

File Change
.ai-checks/branching-and-pr-strategy.md Added GHA synthetic-merge-commit exclusion + don't-infer-branch-from-title exclusion
.ai-checks/comment-discipline.md Added don't-flag-deleted-lines exclusion
.ai-checks/env-var-sync.md Added model-field-≠-env-var exclusion
.ai-checks/pipeline-step-sync.md Added scope-to-step-list exclusion + per-behaviour doc updates are compliant
.ai-checks/feature-pr-docs.md Added CI/CD tooling PRs to no-bundle-needed exclusions

References: issue #117, PR #118 (original checks), PR #122 (migration to .ai-checks/ + GHA runner), PR #130 (path-scoping), PR #134 (sticky comments), #125 (runner fix), #137 (driver hardening).


Generated by Claude Code

Addresses five recurring false-positive patterns surfaced across PRs
#129, #131, #133, and #134 during the first ~2 weeks of the self-hosted
AI checks workflow (introduced in PR #122).

https://claude.ai/code/session_01TWxW11tGiSwySi3gCvZk8L
@github-actions
Copy link
Copy Markdown

AI Checks summary

1 fail · ✅ 4 pass · ⏭️ 0 skip

❌ Feature PR Documentation Bundle

Non-trivial feature PRs must ship the plan, feature doc, both session transcripts, and a changelog entry — with correct metadata format.

The PR does not contain the required documentation files for a feature bundle.

Details

Missing documentation files: There are no new files added at doc/plans/, doc/features/, or doc/sessions/. The PR appears to be a feature/change but does not provide the documentation as mandated by AGENTS.md.


Other checks (4 passing · 0 skipped)

Show details
  • Branching & PR Strategy — Rule does not apply.
  • Comment Discipline — Rule does not apply.
  • RAGTIME_ Env Var Sync* — Rule does not apply.
  • gh api Shell Escaping & Endpoints — Rule does not apply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add .continue/checks/ for AI-powered PR review

2 participants