Path-scope AI checks at matrix emission (closes #124)#130
Conversation
rafacm
left a comment
There was a problem hiding this comment.
Blocking as written: the matrix emitter mostly works for the current rule syntax, but the workflow gate is placed where GitHub Actions cannot reliably see matrix.applies, so the main acceptance criterion is not met.
Path-scoping correctness
The current single-line comma syntax works. .github/scripts/list_ai_checks.py:62-.github/scripts/list_ai_checks.py:69 parses paths: by splitting on commas, and a local smoke test confirms the real frontmatter values do match the expected repo-root paths: episodes/foo/bar.py matches episodes/**, chat/sub/x.py matches chat/**, frontend/src/App.tsx matches frontend/**, and README.md matches README.md.
A few caveats:
.github/scripts/list_ai_checks.py:32-.github/scripts/list_ai_checks.py:42is not YAML parsing; quoted globs and YAML lists do not work. For example,paths: "episodes/**", "README.md"produces patterns including the quote characters, and multilinepaths:\n - episodes/**is treated as empty and therefore applies to every diff. This is acceptable only if the supported format is explicitly "single-line, comma-separated, unquoted".- Python
fnmatchis broader than path-segment globbing:doc/*.excalidrawmatchesdoc/nested/foo.excalidrawbecause*can cross/. That does not break the four current rules, but it is worth documenting because users may expect shell-like segment behavior. .github/scripts/list_ai_checks.py:53-.github/scripts/list_ai_checks.py:54silently converts anygit difffailure into[]. That makes every path-scoped rule non-applicable and skipped. I would rather fail closed here, because a bad base ref or fetch problem should not suppress checks.
For the first-commit/base-ref concern, the workflow does checkout with fetch-depth: 0 before the explicit base fetch (.github/workflows/ai-checks.yml:17-.github/workflows/ai-checks.yml:28), then runs the list step with BASE_REF=origin/<base> (.github/workflows/ai-checks.yml:29-.github/workflows/ai-checks.yml:32). The ordering is fine; the silent fallback on diff failure is the weak part.
For empty diffs, the chosen behavior is path-scoped rules become applies: false and semantic rules remain true. That is defensible, but it should be explicit because it means an empty merge commit still runs the semantic rules.
Workflow YAML
This is the blocking issue. .github/workflows/ai-checks.yml:40 uses:
if: ${{ fromJson(needs.list.outputs.matrix).include[0] != null && matrix.applies }}GitHub's workflow syntax docs state that jobs.<job_id>.if is evaluated before jobs.<job_id>.strategy.matrix is applied, so matrix.applies is not a valid per-shard gate at this level. This is not just a bool/string coercion problem; the matrix context is not available at the time this if is evaluated. The current head already has an immediate failed AI Checks workflow run (25307264491, 0s, no job logs), which is consistent with a workflow-definition/evaluation failure.
So the design goal, "non-applicable rules become real GHA skipped jobs," is not achieved by this YAML. Step-level if: would see matrix.applies, but it would not give the gray skipped job rows the issue is asking for. This needs a different shape before merge.
The JSON boolean itself is fine: .github/scripts/list_ai_checks.py:91 emits JSON true/false, and strategy.matrix: ${{ fromJson(...) }} at .github/workflows/ai-checks.yml:45 would preserve those as booleans after matrix expansion. The problem is only the location of the job-level if:.
The no-applies edge case is also affected by this. Because the matrix still includes all rules with applies: false, strategy.matrix would have shards if job-level matrix gating were supported. Since it is not, "all rules non-applicable" does not cleanly produce all-gray shards with this implementation.
Skip-enum removal
skip is removed from the prompt and schema: .github/scripts/run_ai_check.py:64-.github/scripts/run_ai_check.py:67 defines only pass/fail semantics, and .github/scripts/run_ai_check.py:93-.github/scripts/run_ai_check.py:96 limits the tool enum to pass and fail.
If a model still returns skip anyway, the driver fails closed rather than passing: .github/scripts/run_ai_check.py:270 indexes {"pass": "PASS", "fail": "FAIL"} and will raise KeyError for skip. That produces a failing job/crash, not a green pass. A cleaner explicit validation error would be nicer, but the safety direction is right.
The empty-diff and direct-CLI path-scope safety net now return pass with a non-applicability explanation (.github/scripts/run_ai_check.py:150-.github/scripts/run_ai_check.py:173). That matches the new semantic contract.
Per-rule frontmatter
The four added scopes match the issue table:
.ai-checks/pipeline-step-sync.md:4includesepisodes/models.py,episodes/workflows.py,README.md,doc/README.md, anddoc/*.excalidraw. A diff touching justREADME.mdwill trigger it because paths are repo-root-relative..ai-checks/asgi-wsgi-scott.md:4coversragtime/asgi.py,chat/**, andfrontend/**; with Pythonfnmatch, nested files underchat/andfrontend/do match..ai-checks/qdrant-payload-slim.md:4is tightly scoped toepisodes/vector_store.py; that seems right..ai-checks/entity-creation-race-safety.md:4usesepisodes/**, which includes migrations and unrelated episode tests/admin files. That is broader than the rule's actual risk surface, but it is conservative. The cost is more model calls, not missed failures.
Documentation
The five required artifacts exist: plan, feature doc, planning transcript, implementation transcript, and changelog entry.
There are two documentation compliance issues:
doc/plans/2026-05-04-ai-checks-skipped.md:1starts with## Plan..., but AGENTS.md requires every doc file to start with a# Titleon the first line.- The session transcripts do not include verbatim user messages.
doc/sessions/2026-05-04-ai-checks-skipped-planning-session.md:15summarizes the issue body, anddoc/sessions/2026-05-04-ai-checks-skipped-implementation-session.md:15summarizes a parent-agent prompt. AGENTS.md requires user messages to be verbatim and unedited.
Also, both transcripts use Session ID: unavailable (doc/sessions/2026-05-04-ai-checks-skipped-planning-session.md:5, doc/sessions/2026-05-04-ai-checks-skipped-implementation-session.md:5). That is allowed only when the UUID cannot be recovered, but the transcript should ideally say what was attempted.
Other findings
There is no unit test coverage for .github/scripts/list_ai_checks.py. This is plain Python and now holds the important applicability logic. The parser/glob behavior above is exactly the kind of thing a small unit test would catch.
I did not find a problem with the current real fnmatch values failing to match nested paths; the implementation's assumption that episodes/** matches nested files is true in Python fnmatch.
Suggested follow-ups
- Fix the workflow design so per-rule non-applicability can actually become gray skipped rows under GitHub Actions' matrix evaluation rules.
- Add unit tests for
parse_frontmatter,applies_for, quoted/list frontmatter behavior, empty changed-file lists, and diff-command failures. - Replace the silent
changed_files()fallback with an explicit failure unless there is a deliberately documented reason to skip path-scoped checks on diff errors. - Document the supported
paths:syntax as single-line comma-separated globs, or switch to a real YAML parser. - Tighten
entity-creation-race-safetylater if the broadepisodes/**scope creates too much noise.
— Codex review (independent second-opinion pass)
42ba1f0 to
f610b6a
Compare
…owup) Addresses the blocker from codex review on PR #130: the previous design relied on `jobs.check.if: ${{ matrix.applies }}` to render non-applicable rules as gray "skipped" rows, but GitHub Actions evaluates job-level `if:` *before* expanding `strategy.matrix`, so `matrix.applies` is not visible at that scope and the workflow failed to expand any jobs at all (0-second workflow-validation failure, run 25307264491). Pivot to option 1B: filter non-applicable rules out of the matrix entirely at emission time. They simply don't appear in the PR Checks tab. This isn't the gray-skipped icon the issue originally asked for, but it does fix the original "indistinguishable from green pass" complaint and ships in a workflow that actually runs. True gray-skipped via the GitHub Checks API is tracked separately as #135 for future work. Other codex follow-ups addressed in the same commit: - `changed_files()` now fails closed (`sys.exit(1)`) on `git diff` errors instead of silently returning `[]`. - New unit tests in `.github/scripts/tests/test_list_ai_checks.py` (24 tests, stdlib-only) lock in parser, glob, fail-closed, and end-to-end matrix-emission behavior. The codex-flagged unsupported cases (quoted globs, YAML list form for `paths:`) are explicitly locked in by tests so a future "fix" doesn't silently change them. - Supported `paths:` syntax documented in docstrings on `parse_frontmatter` / `applies_for` and in the feature doc. Plan, feature doc, and implementation session transcript all carry a "Revision after codex review" section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex review followupAddressing the blocker from the codex review (id Design pivot: option 1B (matrix-filter)The codex review correctly flagged that Pivoted to option 1B: filter non-applicable rules out of the matrix entirely at emission time in Did not go with option 1C (publish gray rows via the GitHub Checks API from the Other codex follow-ups addressed
Verification
Punted (per the parent task instructions)
|
AGENTS.md's transcript policy assumed the human-with-Claude workflow where every session has real `### User` messages. Conductor's parallel- launch model breaks that assumption: the implementation session has no direct user, only an agent-orchestrated launching prompt. Add a new "Agent-orchestrated sessions" subsection to AGENTS.md and a matching note to the Feature PR Documentation Bundle AI check. The new convention: use `### Parent agent (orchestrator)` headings instead of `### User`, reproduce the parent-agent prompt verbatim, and declare the session as agent-orchestrated at the top of `## Detailed conversation`. Same verbatim rule applies — summarized parent prompts are still rejected. This eliminates a recurring false-negative on the docs check that was raised on PRs #129, #130, #131, #133. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move per-rule applicability from the runner to the matrix-emitter `list` job so non-applicable shards skip at the GHA level (gray icon) rather than running and exiting 0 (green). Drop `skip` from the verdict enum; semantic non-applicability is now `pass` with a "rule does not apply" note. Closes #124. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…owup) Addresses the blocker from codex review on PR #130: the previous design relied on `jobs.check.if: ${{ matrix.applies }}` to render non-applicable rules as gray "skipped" rows, but GitHub Actions evaluates job-level `if:` *before* expanding `strategy.matrix`, so `matrix.applies` is not visible at that scope and the workflow failed to expand any jobs at all (0-second workflow-validation failure, run 25307264491). Pivot to option 1B: filter non-applicable rules out of the matrix entirely at emission time. They simply don't appear in the PR Checks tab. This isn't the gray-skipped icon the issue originally asked for, but it does fix the original "indistinguishable from green pass" complaint and ships in a workflow that actually runs. True gray-skipped via the GitHub Checks API is tracked separately as #135 for future work. Other codex follow-ups addressed in the same commit: - `changed_files()` now fails closed (`sys.exit(1)`) on `git diff` errors instead of silently returning `[]`. - New unit tests in `.github/scripts/tests/test_list_ai_checks.py` (24 tests, stdlib-only) lock in parser, glob, fail-closed, and end-to-end matrix-emission behavior. The codex-flagged unsupported cases (quoted globs, YAML list form for `paths:`) are explicitly locked in by tests so a future "fix" doesn't silently change them. - Supported `paths:` syntax documented in docstrings on `parse_frontmatter` / `applies_for` and in the feature doc. Plan, feature doc, and implementation session transcript all carry a "Revision after codex review" section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
15a4017 to
16cae99
Compare
Summary
Path-scoped AI checks no longer fire on PRs whose changed files don't match the rule's
paths:frontmatter.list_ai_checks.pyreads the frontmatter, runsgit diff --name-only $BASE_REF...HEAD, and filters non-applicable rules out of the matrix entirely — they don't appear as a check at all rather than rendering as a (false) green pass. Closes #124. Future Checks-API rendering work to deliver true gray "skipped" icons is tracked in #135.Details
paths:frontmatter:pipeline-step-sync,asgi-wsgi-scott,qdrant-payload-slim,entity-creation-race-safety. The other five stay semantic.skipfrom its enum. Semantic non-applicability is nowpasswithsummary: "Rule does not apply."and a one-linedetailsexplanation.list_ai_checks.pyfails closed ongit differrors instead of silently treating every path-scoped rule as non-applicable.parse_frontmatter,applies_for,matches_paths,changed_files, and the end-to-endmain()behavior.Why filter at emission rather than render gray
The PR's first commit attempted job-level
if: matrix.appliesto deliver true gray-skipped icons. GHA's evaluation order with dynamic matrices broke that approach (the workflow ran in 0 seconds with zero jobs expanded — codex review caught it). The 1B follow-up commit pivoted to filtering at emission. The Checks-API alternative for true gray rendering is captured in #135.Behaviour on this PR
BASE_REF=main python .github/scripts/list_ai_checks.pyon this branch emits 5 applicable rules and filters out 4 non-applicable ones. The four filtered rules —ASGI vs WSGI Awareness for Scott,Entity Creation Race Safety,Pipeline Step Documentation Sync,Slim Qdrant Payload Discipline— don't appear in the Checks tab at all.Test plan
AI Checksworkflow.skipis removed from the verdict enum so the model cannot return it.Documentation
doc/plans/2026-05-04-ai-checks-skipped.mddoc/features/2026-05-04-ai-checks-skipped.mddoc/sessions/2026-05-04-ai-checks-skipped-{planning,implementation}-session.md## 2026-05-04→### ChangedGenerated with Claude Code