Skip to content

feat: Closed PR Review Auto-Improver (automated feedback loop)#1755

Draft
nick-inkeep wants to merge 17 commits intomainfrom
feature/closed-pr-review-auto-improver
Draft

feat: Closed PR Review Auto-Improver (automated feedback loop)#1755
nick-inkeep wants to merge 17 commits intomainfrom
feature/closed-pr-review-auto-improver

Conversation

@nick-inkeep
Copy link
Collaborator

Summary

Introduces an automated system that learns from human reviewers to continuously improve our AI code review agents.

The Problem

When human reviewers catch issues that our pr-review-* agents miss, that knowledge currently dies with the PR. We manually noticed patterns like "Type Definition Discipline" (PR #1737) but there's no systematic way to:

  1. Detect when humans catch something bots missed
  2. Determine if it's a generalizable pattern (vs repo-specific)
  3. Propose improvements to the reviewer agents

The Solution

A GitHub Actions workflow that triggers after every PR merge:

PR Merged → Extract Human Comments → Analyze Gaps → Apply Generalizability Test → Create Draft PR (if HIGH)

Key innovation: Git time-travel — The agent reconstructs what the human reviewer saw at comment time (not the final merged state), since issues are often fixed before merge.

How It Works

  1. Trigger: pull_request_target: [closed] + merged == true
  2. Extract: GraphQL query fetches all comment types (discussion, inline, reviews)
  3. Filter: Removes bot comments and trivial human comments ("LGTM", "thanks")
  4. Analyze: Agent investigates each promising comment:
    • Uses git rev-list --before + git show to see code at comment time
    • Progressive context gathering (diffHunk → full file → PR diff → other files)
    • Explicit stop conditions: EXIT A (not generalizable) or EXIT B (pattern found)
  5. Test: 4-criteria generalizability test (cross-codebase, universal principle, expressible, industry-recognized)
  6. Output: If HIGH generalizability → creates draft PR with improvements to pr-review-*.md

Generalizability Test (all must pass)

Criterion Question
Cross-codebase Would this pattern appear in other TS/React/Node codebases?
Universal principle Is it DRY, SOLID, separation of concerns, etc.?
Expressible Can it be a checklist item, detection pattern, or failure mode?
Industry recognition Would senior engineers elsewhere recognize this?

Conservative by default: Better to miss a good pattern than pollute reviewers with repo-specific noise.

Files

File Purpose
.github/workflows/closed-pr-review-auto-improver.yml Workflow: trigger, comment extraction, context passing
.claude/agents/closed-pr-review-auto-improver.md Agent: analysis framework, stop conditions, output contract

Example Output

When the agent finds a HIGH-generalizability pattern, it creates a draft PR like:

pr-review: Learnings from PR #1737

Patterns extracted from human reviewer feedback:
- Type Definition Discipline: Check if new types should derive from existing schemas

Design Decisions

  • Draft PRs (not auto-merge): Human review of proposed improvements
  • HIGH only: MEDIUM patterns are noted but don't create PRs
  • Opus model: Pattern recognition requires strong reasoning
  • No nested subagents: Runs as workflow-triggered agent

Test Plan

  • Verify workflow triggers on merged PRs (not closed-without-merge)
  • Test comment extraction against a known PR with human comments
  • Verify bot filtering excludes claude-code, dependabot, etc.
  • Test git time-travel reconstructs correct code state
  • Verify agent correctly classifies repo-specific vs generalizable patterns
  • Confirm draft PR creation with proper formatting

This PR implements the feedback loop: human reviewers catch patterns → this agent extracts generalizable improvements → pr-review- agents get better → fewer gaps for humans to catch.*

nick-inkeep and others added 6 commits February 5, 2026 16:20
Automated system that analyzes human reviewer feedback after PRs are merged
to identify generalizable improvements for the pr-review-* subagent system.

- Workflow triggers on merged PRs, extracts human/bot comments
- Agent applies 4-criteria generalizability test
- Creates draft PRs with improvements to pr-review-*.md files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Include diffHunk in GraphQL query (shows code each comment is on)
- Add Phase 2 "Deep-Dive on Promising Comments" with explicit guidance:
  - Read the full file to understand broader context
  - Grep for schemas/types/patterns mentioned in comments
  - Understand the anti-pattern before judging generalizability
- Update Tool Policy to emphasize context gathering
- Renumber phases (now 6 phases total)

The agent now actively investigates each comment rather than
judging based on comment text alone.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Based on write-agent skill guidance:

1. Add near-miss example (questions/discussions ≠ reviewer feedback)
2. Strengthen Role & Mission - describe what "excellence looks like"
3. Failure modes now use contrastive examples (❌ vs ✅)
4. Phase 2 now checklist format with stop condition
5. Example shows completed checklist, not just steps

Key insight: "Stop here if you can't articulate a clear principle"
prevents vague improvements from polluting reviewers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Phase 2 now uses git rev-list + git show to see code at comment time
- Progressive gathering: diffHunk → full file → PR diff → other files
- GraphQL query now includes createdAt for all comment types
- Added git rev-list and git show to allowedTools

This ensures the agent sees what the human reviewer saw, not the
final merged state which may have fixes applied.

Co-Authored-By: Claude <noreply@anthropic.com>
Two exit paths at each level:
- EXIT A: Not generalizable (repo-specific, one-off bug, style preference)
- EXIT B: Pattern found (can articulate anti-pattern + universal principle)

Includes decision flow diagram and two contrasting examples showing
early exit (repo-specific DateUtils) vs pattern discovery (type/schema DRY).

Co-Authored-By: Claude <noreply@anthropic.com>
- Role & Mission: Add "what the best human analyst would do" section
- Failure modes: Add "Asserting when uncertain" with contrastive example
- Generalizability: Add confidence calibration guidance
- Add explicit conservative default: "when torn, choose lower confidence"

Per write-agent skill review: personality should describe best human
behavior, failure modes should include asserting when uncertain
(relevant for classification tasks).

Co-Authored-By: Claude <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 6, 2026 4:54am
agents-docs Ready Ready Preview, Comment Feb 6, 2026 4:54am
agents-manage-ui Ready Ready Preview, Comment Feb 6, 2026 4:54am

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Feb 6, 2026

⚠️ No Changeset found

Latest commit: c635ab5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Pattern extracted from PR #1737 human reviewer feedback (amikofalvy):
- Types should derive from Zod schemas using z.infer<typeof schema>
- Use Pick/Omit/Partial instead of manually redefining type subsets
- Extract shared enum/union schemas instead of inline string literals

Changes:
- pr-review-types.md: New anti-pattern + analysis step 6 with detection patterns
- pr-review-consistency.md: Extended "Reuse" section to cover types

This demonstrates the closed-pr-review-auto-improver output — these are
the exact changes the agent proposed when run against PR #1737.

Co-Authored-By: Claude <noreply@anthropic.com>
Extended "Schema-Type Derivation Discipline" to cover full spectrum:
- Zod/validation schemas (z.infer)
- Database schemas (Prisma, Drizzle generated types)
- Internal packages (@inkeep/*, shared types)
- External packages/SDKs (OpenAI, Vercel AI SDK)
- Function signatures (Parameters<>, ReturnType<>)
- Existing domain types (Pick, Omit, Partial)

Added table format for clarity and comprehensive detection patterns.

Co-Authored-By: Claude <noreply@anthropic.com>
nick-inkeep and others added 2 commits February 5, 2026 17:10
Expanded type derivation guidance based on actual patterns found in agents repo:
- Awaited<ReturnType<>> for async function returns
- keyof typeof for constants-derived types
- interface extends and intersection (&) for composition
- Discriminated unions with type guards
- satisfies operator for type-safe constants
- Re-exports for API surface boundaries
- Type duplication detection signals

Patterns sourced from agents-api codebase analysis including:
- env.ts, middleware/*, types/app.ts, domains/run/*

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added guidance for Zod schema extension/derivation patterns based on
codebase research (packages/agents-core/src/validation/schemas.ts):

- .extend() for adding/overriding fields
- .pick()/.omit() for field subsetting
- .partial() for Insert → Update schema derivation
- .extend().refine() for cross-field validation
- Anti-patterns: parallel schemas, duplicated fields

Examples from codebase:
- SubAgentInsertSchema.extend({ id: ResourceIdSchema })
- SubAgentUpdateSchema = SubAgentInsertSchema.partial()
- StopWhenSchema.pick({ transferCountIs: true })

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clear separation of concerns:
- pr-review-types: Illegal states, invariants, unsafe narrowing
- pr-review-consistency: DRY, schema reuse, convention conformance

Moved to consistency:
- Zod schema composition patterns (.extend, .pick, .partial)
- Type derivation detection signals
- satisfies operator, re-exports conventions

Kept in types (type safety focus):
- Discriminated unions vs optional fields (prevents illegal states)
- Type guards vs unsafe `as` assertions
- Detection of union types without discriminants

Added cross-reference note in types agent pointing to consistency
for derivation/DRY concerns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…and Phase 5.5

- Add skills: pr-review-subagents-available, pr-review-subagents-guidelines, find-similar-patterns
- Add proper exit states at Phase 1, 2, and 4 (embedded in workflow, not separate section)
- Add Phase 5 step 2: "Find examples of the pattern" with judgment guidance
- Add Phase 5.5: Full file review & integration planning (scope fit, duplication check)
- Update output contract with detailed JSON structure and exit examples
- Add reviewer tagging to close the feedback loop

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Agents should be self-contained without cross-references to other agents.
This prevents coupling and ensures agents work correctly when read in isolation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These skills were created in the previous session but never committed.
Recovered from conversation history.

- find-similar-patterns: Methodology for finding similar code patterns
- pr-review-subagents-available: Catalog of pr-review-* agents with scope boundaries
- pr-review-subagents-guidelines: Best practices for writing/improving reviewers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The pr-review-consistency.md and pr-review-types.md improvements belong
in PR #1759, not this auto-improver feature branch.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move agent and skills to inkeep/internal-cc-plugins for CI/CD-only loading:
- Removed: .claude/agents/closed-pr-review-auto-improver.md
- Removed: .agents/skills/{find-similar-patterns,pr-review-subagents-available,pr-review-subagents-guidelines}/

Updated workflow:
- Added step to clone inkeep/internal-cc-plugins
- Added --plugin-dir flag to load agent from plugin

Prerequisites before merging:
1. Create private repo: inkeep/internal-cc-plugins
2. Push plugin content to new repo
3. Add GH_PAT_PLUGINS secret to inkeep/agents

Co-Authored-By: Claude <noreply@anthropic.com>
GitHub Apps provide better security and maintainability:
- 8-hour token lifetime (vs days/infinite for PATs)
- No user account dependency (survives personnel changes)
- Zero manual rotation (tokens generated fresh each run)
- Scales to N plugins without additional credentials

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant