Skip to content

Path-scope AI checks at matrix emission (closes #124)#130

Merged
rafacm merged 2 commits intomainfrom
rafacm/ai-checks-skipped
May 4, 2026
Merged

Path-scope AI checks at matrix emission (closes #124)#130
rafacm merged 2 commits intomainfrom
rafacm/ai-checks-skipped

Conversation

@rafacm
Copy link
Copy Markdown
Owner

@rafacm rafacm commented May 4, 2026

Summary

Path-scoped AI checks no longer fire on PRs whose changed files don't match the rule's paths: frontmatter. list_ai_checks.py reads the frontmatter, runs git diff --name-only $BASE_REF...HEAD, and filters non-applicable rules out of the matrix entirely — they don't appear as a check at all rather than rendering as a (false) green pass. Closes #124. Future Checks-API rendering work to deliver true gray "skipped" icons is tracked in #135.

Details

  • Four rules gain paths: frontmatter: pipeline-step-sync, asgi-wsgi-scott, qdrant-payload-slim, entity-creation-race-safety. The other five stay semantic.
  • Driver verdict tool drops skip from its enum. Semantic non-applicability is now pass with summary: "Rule does not apply." and a one-line details explanation.
  • list_ai_checks.py fails closed on git diff errors instead of silently treating every path-scoped rule as non-applicable.
  • 24 unit tests added for parse_frontmatter, applies_for, matches_paths, changed_files, and the end-to-end main() behavior.
  • Rule frontmatter syntax: single-line, comma-separated, unquoted globs only. Documented in the parser docstring and feature doc; multiline YAML lists and quoted patterns are explicitly not supported.

Why filter at emission rather than render gray

The PR's first commit attempted job-level if: matrix.applies to deliver true gray-skipped icons. GHA's evaluation order with dynamic matrices broke that approach (the workflow ran in 0 seconds with zero jobs expanded — codex review caught it). The 1B follow-up commit pivoted to filtering at emission. The Checks-API alternative for true gray rendering is captured in #135.

Behaviour on this PR

BASE_REF=main python .github/scripts/list_ai_checks.py on this branch emits 5 applicable rules and filters out 4 non-applicable ones. The four filtered rules — ASGI vs WSGI Awareness for Scott, Entity Creation Race Safety, Pipeline Step Documentation Sync, Slim Qdrant Payload Discipline — don't appear in the Checks tab at all.

Test plan

  • Push triggers the AI Checks workflow.
  • In the PR Checks tab, the four non-applicable rules do not appear (filtered out at emission). True gray-skipped icon rendering deferred to AI checks: render non-applicable rules as gray 'skipped' via the GitHub Checks API #135.
  • The other five rules render green (or red, if any flags a violation): Branching & PR Strategy, Comment Discipline, Feature PR Documentation Bundle, RAGTIME_* Env Var Sync, gh api Shell Escaping & Endpoints.
  • No green icons say "skip" in the step summary — skip is removed from the verdict enum so the model cannot return it.

Documentation

  • Plan: doc/plans/2026-05-04-ai-checks-skipped.md
  • Feature doc: doc/features/2026-05-04-ai-checks-skipped.md
  • Session transcripts: doc/sessions/2026-05-04-ai-checks-skipped-{planning,implementation}-session.md
  • Changelog entry under ## 2026-05-04### Changed

Generated with Claude Code

Copy link
Copy Markdown
Owner Author

@rafacm rafacm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking as written: the matrix emitter mostly works for the current rule syntax, but the workflow gate is placed where GitHub Actions cannot reliably see matrix.applies, so the main acceptance criterion is not met.

Path-scoping correctness

The current single-line comma syntax works. .github/scripts/list_ai_checks.py:62-.github/scripts/list_ai_checks.py:69 parses paths: by splitting on commas, and a local smoke test confirms the real frontmatter values do match the expected repo-root paths: episodes/foo/bar.py matches episodes/**, chat/sub/x.py matches chat/**, frontend/src/App.tsx matches frontend/**, and README.md matches README.md.

A few caveats:

  • .github/scripts/list_ai_checks.py:32-.github/scripts/list_ai_checks.py:42 is not YAML parsing; quoted globs and YAML lists do not work. For example, paths: "episodes/**", "README.md" produces patterns including the quote characters, and multiline paths:\n - episodes/** is treated as empty and therefore applies to every diff. This is acceptable only if the supported format is explicitly "single-line, comma-separated, unquoted".
  • Python fnmatch is broader than path-segment globbing: doc/*.excalidraw matches doc/nested/foo.excalidraw because * can cross /. That does not break the four current rules, but it is worth documenting because users may expect shell-like segment behavior.
  • .github/scripts/list_ai_checks.py:53-.github/scripts/list_ai_checks.py:54 silently converts any git diff failure into []. That makes every path-scoped rule non-applicable and skipped. I would rather fail closed here, because a bad base ref or fetch problem should not suppress checks.

For the first-commit/base-ref concern, the workflow does checkout with fetch-depth: 0 before the explicit base fetch (.github/workflows/ai-checks.yml:17-.github/workflows/ai-checks.yml:28), then runs the list step with BASE_REF=origin/<base> (.github/workflows/ai-checks.yml:29-.github/workflows/ai-checks.yml:32). The ordering is fine; the silent fallback on diff failure is the weak part.

For empty diffs, the chosen behavior is path-scoped rules become applies: false and semantic rules remain true. That is defensible, but it should be explicit because it means an empty merge commit still runs the semantic rules.

Workflow YAML

This is the blocking issue. .github/workflows/ai-checks.yml:40 uses:

if: ${{ fromJson(needs.list.outputs.matrix).include[0] != null && matrix.applies }}

GitHub's workflow syntax docs state that jobs.<job_id>.if is evaluated before jobs.<job_id>.strategy.matrix is applied, so matrix.applies is not a valid per-shard gate at this level. This is not just a bool/string coercion problem; the matrix context is not available at the time this if is evaluated. The current head already has an immediate failed AI Checks workflow run (25307264491, 0s, no job logs), which is consistent with a workflow-definition/evaluation failure.

So the design goal, "non-applicable rules become real GHA skipped jobs," is not achieved by this YAML. Step-level if: would see matrix.applies, but it would not give the gray skipped job rows the issue is asking for. This needs a different shape before merge.

The JSON boolean itself is fine: .github/scripts/list_ai_checks.py:91 emits JSON true/false, and strategy.matrix: ${{ fromJson(...) }} at .github/workflows/ai-checks.yml:45 would preserve those as booleans after matrix expansion. The problem is only the location of the job-level if:.

The no-applies edge case is also affected by this. Because the matrix still includes all rules with applies: false, strategy.matrix would have shards if job-level matrix gating were supported. Since it is not, "all rules non-applicable" does not cleanly produce all-gray shards with this implementation.

Skip-enum removal

skip is removed from the prompt and schema: .github/scripts/run_ai_check.py:64-.github/scripts/run_ai_check.py:67 defines only pass/fail semantics, and .github/scripts/run_ai_check.py:93-.github/scripts/run_ai_check.py:96 limits the tool enum to pass and fail.

If a model still returns skip anyway, the driver fails closed rather than passing: .github/scripts/run_ai_check.py:270 indexes {"pass": "PASS", "fail": "FAIL"} and will raise KeyError for skip. That produces a failing job/crash, not a green pass. A cleaner explicit validation error would be nicer, but the safety direction is right.

The empty-diff and direct-CLI path-scope safety net now return pass with a non-applicability explanation (.github/scripts/run_ai_check.py:150-.github/scripts/run_ai_check.py:173). That matches the new semantic contract.

Per-rule frontmatter

The four added scopes match the issue table:

  • .ai-checks/pipeline-step-sync.md:4 includes episodes/models.py, episodes/workflows.py, README.md, doc/README.md, and doc/*.excalidraw. A diff touching just README.md will trigger it because paths are repo-root-relative.
  • .ai-checks/asgi-wsgi-scott.md:4 covers ragtime/asgi.py, chat/**, and frontend/**; with Python fnmatch, nested files under chat/ and frontend/ do match.
  • .ai-checks/qdrant-payload-slim.md:4 is tightly scoped to episodes/vector_store.py; that seems right.
  • .ai-checks/entity-creation-race-safety.md:4 uses episodes/**, which includes migrations and unrelated episode tests/admin files. That is broader than the rule's actual risk surface, but it is conservative. The cost is more model calls, not missed failures.

Documentation

The five required artifacts exist: plan, feature doc, planning transcript, implementation transcript, and changelog entry.

There are two documentation compliance issues:

  • doc/plans/2026-05-04-ai-checks-skipped.md:1 starts with ## Plan..., but AGENTS.md requires every doc file to start with a # Title on the first line.
  • The session transcripts do not include verbatim user messages. doc/sessions/2026-05-04-ai-checks-skipped-planning-session.md:15 summarizes the issue body, and doc/sessions/2026-05-04-ai-checks-skipped-implementation-session.md:15 summarizes a parent-agent prompt. AGENTS.md requires user messages to be verbatim and unedited.

Also, both transcripts use Session ID: unavailable (doc/sessions/2026-05-04-ai-checks-skipped-planning-session.md:5, doc/sessions/2026-05-04-ai-checks-skipped-implementation-session.md:5). That is allowed only when the UUID cannot be recovered, but the transcript should ideally say what was attempted.

Other findings

There is no unit test coverage for .github/scripts/list_ai_checks.py. This is plain Python and now holds the important applicability logic. The parser/glob behavior above is exactly the kind of thing a small unit test would catch.

I did not find a problem with the current real fnmatch values failing to match nested paths; the implementation's assumption that episodes/** matches nested files is true in Python fnmatch.

Suggested follow-ups

  • Fix the workflow design so per-rule non-applicability can actually become gray skipped rows under GitHub Actions' matrix evaluation rules.
  • Add unit tests for parse_frontmatter, applies_for, quoted/list frontmatter behavior, empty changed-file lists, and diff-command failures.
  • Replace the silent changed_files() fallback with an explicit failure unless there is a deliberately documented reason to skip path-scoped checks on diff errors.
  • Document the supported paths: syntax as single-line comma-separated globs, or switch to a real YAML parser.
  • Tighten entity-creation-race-safety later if the broad episodes/** scope creates too much noise.

— Codex review (independent second-opinion pass)

@rafacm rafacm force-pushed the rafacm/ai-checks-skipped branch from 42ba1f0 to f610b6a Compare May 4, 2026 08:58
rafacm added a commit that referenced this pull request May 4, 2026
…owup)

Addresses the blocker from codex review on PR #130: the previous design
relied on `jobs.check.if: ${{ matrix.applies }}` to render non-applicable
rules as gray "skipped" rows, but GitHub Actions evaluates job-level
`if:` *before* expanding `strategy.matrix`, so `matrix.applies` is not
visible at that scope and the workflow failed to expand any jobs at all
(0-second workflow-validation failure, run 25307264491).

Pivot to option 1B: filter non-applicable rules out of the matrix
entirely at emission time. They simply don't appear in the PR Checks
tab. This isn't the gray-skipped icon the issue originally asked for,
but it does fix the original "indistinguishable from green pass"
complaint and ships in a workflow that actually runs.

True gray-skipped via the GitHub Checks API is tracked separately as
#135 for future work.

Other codex follow-ups addressed in the same commit:

- `changed_files()` now fails closed (`sys.exit(1)`) on `git diff`
  errors instead of silently returning `[]`.
- New unit tests in `.github/scripts/tests/test_list_ai_checks.py`
  (24 tests, stdlib-only) lock in parser, glob, fail-closed, and
  end-to-end matrix-emission behavior. The codex-flagged unsupported
  cases (quoted globs, YAML list form for `paths:`) are explicitly
  locked in by tests so a future "fix" doesn't silently change them.
- Supported `paths:` syntax documented in docstrings on
  `parse_frontmatter` / `applies_for` and in the feature doc.

Plan, feature doc, and implementation session transcript all carry a
"Revision after codex review" section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rafacm
Copy link
Copy Markdown
Owner Author

rafacm commented May 4, 2026

Codex review followup

Addressing the blocker from the codex review (id 4218279237).

Design pivot: option 1B (matrix-filter)

The codex review correctly flagged that jobs.check.if: ${{ matrix.applies }} does not work — GitHub Actions evaluates job-level if: before expanding strategy.matrix, so matrix.applies is not visible at that scope. Empirical confirmation: workflow run 25307264491 failed in 0 seconds with no jobs expanded.

Pivoted to option 1B: filter non-applicable rules out of the matrix entirely at emission time in .github/scripts/list_ai_checks.py. Non-applicable rules simply don't appear in the PR Checks tab. This isn't the gray-skipped icon the issue originally asked for, but it does fix the original "indistinguishable from green pass" complaint and ships in a workflow that actually runs.

Did not go with option 1C (publish gray rows via the GitHub Checks API from the list job) because it's a meaningfully different shape — Checks API auth, conclusion publishing, decoupling from strategy.matrix. Tracked separately as #135 for future work.

Other codex follow-ups addressed

  1. Fail closed on git diff errors. changed_files() now sys.exit(1)s with a clear stderr message ("Refusing to silently skip path-scoped rules") instead of silently returning []. The list job fails loudly on a bad base ref, which is the right blast radius.
  2. Unit tests. New .github/scripts/tests/test_list_ai_checks.py — 24 stdlib-only unittest tests covering parse_frontmatter, applies_for, matches_paths, changed_files() (success + fail-closed), and main() end-to-end. The codex-flagged unsupported cases (quoted globs, multiline YAML list form) are explicitly locked in by tests so a future "fix" doesn't silently change them. Run with python -m unittest discover .github/scripts/tests -v.
  3. paths: syntax documentation. Docstrings on parse_frontmatter and applies_for document the supported syntax (single-line, comma-separated, unquoted, Python fnmatch semantics — * crosses /, ** is not segment-aware). Same constraints in a new "Supported paths: syntax" section in the feature doc.
  4. Plan + transcript revision. Both carry a "Revision after codex review" section explaining the pivot.

Verification

  • All 24 unit tests pass locally.
  • Latest workflow run on this branch (25311186716) expanded 5 jobs matching the 5 applicable rules — no 0-second failure. The list step confirms only applicable rules are emitted, no applies field.

Punted (per the parent task instructions)

  • Verdict-enum changes (already correct).
  • paths: frontmatter on the 4 rules (already correct).
  • Tightening entity-creation-race-safety scope (deferred per codex).
  • The ## Plan... first-line nit and the verbatim-user-messages nit on the original session transcripts (left to address separately).

rafacm added a commit that referenced this pull request May 4, 2026
AGENTS.md's transcript policy assumed the human-with-Claude workflow
where every session has real `### User` messages. Conductor's parallel-
launch model breaks that assumption: the implementation session has no
direct user, only an agent-orchestrated launching prompt.

Add a new "Agent-orchestrated sessions" subsection to AGENTS.md and a
matching note to the Feature PR Documentation Bundle AI check. The new
convention: use `### Parent agent (orchestrator)` headings instead of
`### User`, reproduce the parent-agent prompt verbatim, and declare the
session as agent-orchestrated at the top of `## Detailed conversation`.

Same verbatim rule applies — summarized parent prompts are still
rejected. This eliminates a recurring false-negative on the docs check
that was raised on PRs #129, #130, #131, #133.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rafacm and others added 2 commits May 4, 2026 15:03
Move per-rule applicability from the runner to the matrix-emitter `list`
job so non-applicable shards skip at the GHA level (gray icon) rather
than running and exiting 0 (green). Drop `skip` from the verdict enum;
semantic non-applicability is now `pass` with a "rule does not apply"
note. Closes #124.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…owup)

Addresses the blocker from codex review on PR #130: the previous design
relied on `jobs.check.if: ${{ matrix.applies }}` to render non-applicable
rules as gray "skipped" rows, but GitHub Actions evaluates job-level
`if:` *before* expanding `strategy.matrix`, so `matrix.applies` is not
visible at that scope and the workflow failed to expand any jobs at all
(0-second workflow-validation failure, run 25307264491).

Pivot to option 1B: filter non-applicable rules out of the matrix
entirely at emission time. They simply don't appear in the PR Checks
tab. This isn't the gray-skipped icon the issue originally asked for,
but it does fix the original "indistinguishable from green pass"
complaint and ships in a workflow that actually runs.

True gray-skipped via the GitHub Checks API is tracked separately as
#135 for future work.

Other codex follow-ups addressed in the same commit:

- `changed_files()` now fails closed (`sys.exit(1)`) on `git diff`
  errors instead of silently returning `[]`.
- New unit tests in `.github/scripts/tests/test_list_ai_checks.py`
  (24 tests, stdlib-only) lock in parser, glob, fail-closed, and
  end-to-end matrix-emission behavior. The codex-flagged unsupported
  cases (quoted globs, YAML list form for `paths:`) are explicitly
  locked in by tests so a future "fix" doesn't silently change them.
- Supported `paths:` syntax documented in docstrings on
  `parse_frontmatter` / `applies_for` and in the feature doc.

Plan, feature doc, and implementation session transcript all carry a
"Revision after codex review" section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rafacm rafacm force-pushed the rafacm/ai-checks-skipped branch from 15a4017 to 16cae99 Compare May 4, 2026 13:03
@rafacm rafacm changed the title Render non-applicable AI checks as gray 'skipped' Path-scope AI checks at matrix emission (closes #124) May 4, 2026
@rafacm rafacm merged commit 69a0081 into main May 4, 2026
15 of 16 checks passed
@rafacm rafacm deleted the rafacm/ai-checks-skipped branch May 4, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Render non-applicable AI checks as 'skipped' (gray icon), not 'success' (green)

1 participant