feat(skill): add skill_runner to replace per-step manual orchestration by wwind123 · Pull Request #288 · wwind123/coding-review-agent-loop

wwind123 · 2026-06-11T18:55:47Z

Summary

Adds helpers/skill_runner.py with run-plan-round and run-pr-round subcommands that encapsulate the full review-round pipeline in a single CLI call (build-resume, plan validation, AGENT_LOOP_META attachment, GitHub posting, reviewer invocation, response validation/rendering, session-state update)
Adds helpers/prompt_builders.py wrapping build_plan_review_prompt / build_review_prompt for use without manual AgentLoopConfig construction
Extends state_manager build-resume descriptor with completed_reviewer_data, completed_round_number, and current_plan_subject fields needed by the ledger-transition algorithm
Adds --flow {plan,pr} to run_external so PR dry-run stubs use AGENT_STATE/pr_review markers
Exposes apply_item_dispositions as a public wrapper in unresolved_items.py for the round-to-round ledger transition
Collapses SKILL.md from 8 manual steps to 3 (write plan → run skill_runner → read JSON result)
Adds TestSkillRunner tests with fake-gh isolation (no real GitHub calls)

Closes #287.

Test plan

pytest tests/test_skill_helpers.py — 16 passed
Dry-run run-plan-round exits 0 and returns valid JSON result
Invalid plan file causes non-zero exit
Dry-run does not invoke gh for posting (fake gh fails loudly on comment calls)

-- Anthropic Claude

closes #287) Introduces helpers/skill_runner.py (run-plan-round / run-pr-round subcommands) that encapsulate the full review-round pipeline — build-resume, plan validation, AGENT_LOOP_META attachment, GitHub posting, reviewer invocation, response validation/rendering, and session-state update — in a single CLI call. Supporting changes: - helpers/prompt_builders.py: new module wrapping build_plan_review_prompt and build_review_prompt for use without constructing AgentLoopConfig manually - helpers/state_manager.py: extend build-resume descriptor with completed_reviewer_data, completed_round_number, and current_plan_subject - helpers/run_external.py: add --flow {plan,pr} flag; pr dry-run stub uses AGENT_STATE / pr_review instead of plan-review markers - src/coding_review_agent_loop/unresolved_items.py: expose apply_item_dispositions as public wrapper for skill_runner ledger transition - SKILL.md: collapse 8-step manual orchestration into 3 steps backed by skill_runner - tests/test_skill_helpers.py: add TestSkillRunner with fake-gh isolation and TestRunExternal.test_dry_run_flow_pr_writes_pr_review_stub Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wwind123 · 2026-06-11T19:13:34Z

Review verdict: Blocking

Introduces skill_runner.py for automated orchestration, which significantly simplifies the review process. However, there are blocking issues in review item ID generation and the logic for avoiding duplicate comment posts during resume.

Blocking issues

Item ID collisions in _run_reviewer: item_counter starts at len(next_prior_items_raw) + 1. Since next_prior_items_raw only includes unresolved items, its length decreases when items are resolved in previous rounds, leading to ID collisions in subsequent rounds. It should instead determine the next ID by finding the maximum numeric suffix among all existing items (prior and current) or using a persistent counter.
Duplicate comment check in _reconcile_pending_comment is flawed: already_posted = body_text in all_bodies compares a tagged body (containing AGENT_LOOP_META) against what are likely cleaned bodies (without metadata) returned by _extract_round_metadata_records. This will lead to duplicate posts of the same comment during session resume if the 'clear' step previously failed.

Same-PR follow-ups

Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested).
Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure instead of logging an error or raising, which could lead to empty-diff reviews without explanation.
Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid confusion when it stores a PR head SHA.

-- Google Gemini

wwind123 · 2026-06-11T19:16:02Z

Review verdict: Blocking

The new skill_runner has resume/aggregation bugs that can duplicate posted plans and falsely report approval while same-round followups remain unresolved.

Blocking issues

helpers/skill_runner.py:497: The plan-posting gate uses if not local_completed, so resuming a round after the plan comment was posted but before any reviewer completed reposts the same plan every time. In the resume path current_plan_subject == plan_subject and completed_reviewer_names can legitimately be empty; this should skip coder-plan posting and only run pending reviewers.
helpers/skill_runner.py:367: Same-round followups are serialized as unresolved same-plan/same-pr items, but _run_reviewer returns only blocking_texts, and the aggregate result only extends round_blocking_items with that field. A reviewer response with state: blocking and only same_plan_followups/same_pr_followups therefore produces state: approved with empty blocking_items, losing required before-merge work.
helpers/skill_runner.py:142: Pending-comment reconciliation fetches comment bodies with --jq .[].body and then splits stdout on every newline, which breaks multiline comments containing AGENT_LOOP_META. _extract_round_metadata_records will not see the original comment bodies, so a crash after posting but before clearing the pending marker can repost an already-posted comment instead of clearing it.

-- OpenAI Codex

- Fix item ID collisions: use _max_item_number() to compute item_id_offset per reviewer from all seen items (prior + current-round), eliminating duplicate IDs when multiple reviewers run in the same round or items are resolved between rounds - Fix plan re-post on resume: gate plan posting on is_new_round (subject mismatch), not on local_completed being empty; resuming a round where no reviewer has completed no longer re-posts the plan - Fix same-round followups lost when state is blocking: if a reviewer blocks via same_plan_followups/same_pr_followups only (no explicit blocking_items), surface those as blocking_items for the caller; derive overall_state from any_reviewer_blocked rather than round_blocking_items length - Fix pending-comment duplicate detection: compare body_text directly against raw GitHub comment bodies (stripped) instead of via _extract_round_metadata_records - Fix comment fetch for multiline bodies: replace gh api --jq .[].body (splits on newlines) with gh issue view --json comments (parses JSON correctly); update fake gh in tests accordingly - Fix Gemini prose/duplicate-marker normalization: _normalize_raw_response now also strips leading prose before the JSON block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wwind123 · 2026-06-11T21:56:11Z

Review verdict: Blocking

The PR successfully implements skill_runner.py to automate the review round pipeline, significantly simplifying the orchestration steps in SKILL.md. It also addresses previous blocking items regarding item ID collisions and duplicate comment detection. However, several follow-up items from round 1 remain unaddressed, including test coverage for the PR flow, improved error handling for diff fetching, and renaming of ambiguous variables.

Same-PR follow-ups

Add a test case for run-pr-round in tests/test_skill_helpers.py to ensure the PR review flow is verified in isolation (item-3).
Improve error handling in _fetch_pr_diff: instead of returning an empty string on failure, log an error or raise an exception to prevent reviewers from receiving empty prompts (item-4).
Rename plan_subject in _run_reviewer (and associated call sites) to round_subject or similar, to clarify that it may contain a PR head SHA (item-5).

Prior unresolved item dispositions

[item-1] Blocking issue from Google Gemini, round 1: Item ID collisions in _run_reviewer: item_counter starts at len(next_prior_items_raw) + 1.... -> resolved
[item-2] Blocking issue from Google Gemini, round 1: Duplicate comment check in _reconcile_pending_comment is flawed: `already_posted = body_text in... -> resolved
[item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> same-pr
[item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> same-pr
[item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> same-pr

-- Google Gemini

wwind123 · 2026-06-11T21:57:49Z

Review verdict: Blocking

The runner addresses the earlier ID-collision and pending-comment issues, but new-round result reconstruction still mixes prior-round reviewer records into the active round.

Blocking issues

helpers/skill_runner.py: In both cmd_run_plan_round and cmd_run_pr_round, the Step 5 reconstruction loop always consumes resume.completed_reviewer_data even when current_plan_subject/head_sha differs and the runner has already classified the invocation as a new round. Those records belong to the completed previous round and are only meant to feed _compute_next_prior_items. As written, any previous blocking reviewer sets any_reviewer_blocked=True and contributes old blocking_items to the new round result, while previous approvals are also reported as approved_reviewers for the new round. A revised plan or updated PR can therefore still return state=blocking after all current reviewers approve, which breaks the core round-to-round workflow.

Same-PR follow-ups

Add a direct TestSkillRunner coverage path for helpers.skill_runner run-pr-round; the new PR-flow test only exercises helpers.run_external --flow pr, so the orchestration path remains untested.
Improve _fetch_pr_diff error handling so gh pr diff failures are surfaced instead of silently returning an empty diff.
Rename the _run_reviewer plan_subject parameter or split it by flow, since it still carries a PR head SHA in PR mode.

Prior unresolved item dispositions

[item-1] Blocking issue from Google Gemini, round 1: Item ID collisions in _run_reviewer: item_counter starts at len(next_prior_items_raw) + 1.... -> resolved
[item-2] Blocking issue from Google Gemini, round 1: Duplicate comment check in _reconcile_pending_comment is flawed: `already_posted = body_text in... -> resolved
[item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> same-pr
[item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> same-pr
[item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> same-pr

-- OpenAI Codex

- Fix Step 5 new-round contamination: gate completed_reviewer_data reconstruction on `not is_new_round` in both plan and PR commands; previously previous-round blocking records polluted the new round's result, preventing approval even after all current-round reviewers approved - Fix _fetch_pr_diff silent failure: exit with error instead of returning empty string, preventing silent empty-diff review rounds - Rename plan_subject → round_subject in _run_reviewer to reflect dual use (plan sha256 or PR head SHA) - Add _normalize_disposition_values: fix natural-language variants like 'still blocking' → 'blocking' before validation to handle Gemini quirks - Add test_run_pr_round_dry_run_exits_zero to TestSkillRunner Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wwind123 · 2026-06-11T22:08:55Z

Review verdict: Approved

PR #288 successfully introduces skill_runner.py to automate review rounds, addressing all prior feedback. Tests for PR rounds are included, error handling is hardened, and the round reconstruction logic is corrected to prevent contamination across rounds.

Prior unresolved item dispositions

[item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> resolved
[item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> resolved
[item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> resolved
[item-6] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in tests/test_skill_helpers.py to ensure the PR review flow... -> resolved
[item-7] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: instead of returning an empty string on failure, log... -> resolved
[item-8] Same-PR follow-up from Google Gemini, round 1: Rename plan_subject in _run_reviewer (and associated call sites) to round_subject or simila... -> resolved
[item-9] Blocking issue from OpenAI Codex, round 1: helpers/skill_runner.py: In both cmd_run_plan_round and cmd_run_pr_round, the Step 5 reconstructi... -> resolved
[item-10] Same-PR follow-up from OpenAI Codex, round 1: Add a direct TestSkillRunner coverage path for helpers.skill_runner run-pr-round; the new PR-flow... -> resolved
[item-11] Same-PR follow-up from OpenAI Codex, round 1: Improve _fetch_pr_diff error handling so gh pr diff failures are surfaced instead of silently ret... -> resolved
[item-12] Same-PR follow-up from OpenAI Codex, round 1: Rename the _run_reviewer plan_subject parameter or split it by flow, since it still carries a PR... -> resolved

-- Google Gemini

wwind123 · 2026-06-11T22:11:54Z

Review verdict: Approved

Current head addresses the prior skill_runner orchestration issues and I found no remaining merge-blocking defects in the reviewed diff.

Prior unresolved item dispositions

[item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> resolved
[item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> resolved
[item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> resolved
[item-6] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in tests/test_skill_helpers.py to ensure the PR review flow... -> resolved
[item-7] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: instead of returning an empty string on failure, log... -> resolved
[item-8] Same-PR follow-up from Google Gemini, round 1: Rename plan_subject in _run_reviewer (and associated call sites) to round_subject or simila... -> resolved
[item-9] Blocking issue from OpenAI Codex, round 1: helpers/skill_runner.py: In both cmd_run_plan_round and cmd_run_pr_round, the Step 5 reconstructi... -> resolved
[item-10] Same-PR follow-up from OpenAI Codex, round 1: Add a direct TestSkillRunner coverage path for helpers.skill_runner run-pr-round; the new PR-flow... -> resolved
[item-11] Same-PR follow-up from OpenAI Codex, round 1: Improve _fetch_pr_diff error handling so gh pr diff failures are surfaced instead of silently ret... -> resolved
[item-12] Same-PR follow-up from OpenAI Codex, round 1: Rename the _run_reviewer plan_subject parameter or split it by flow, since it still carries a PR... -> resolved

-- OpenAI Codex

wwind123 merged commit c81f84c into main Jun 11, 2026
1 check passed

wwind123 deleted the feat/287-skill-runner branch June 11, 2026 23:08

wwind123 mentioned this pull request Jun 11, 2026

skill: post coder change-summary comment after addressing review feedback #289

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skill): add skill_runner to replace per-step manual orchestration#288

feat(skill): add skill_runner to replace per-step manual orchestration#288
wwind123 merged 3 commits into
mainfrom
feat/287-skill-runner

wwind123 commented Jun 11, 2026

Uh oh!

wwind123 commented Jun 11, 2026

Uh oh!

wwind123 commented Jun 11, 2026

Uh oh!

wwind123 commented Jun 11, 2026

Uh oh!

wwind123 commented Jun 11, 2026

Uh oh!

wwind123 commented Jun 11, 2026

Uh oh!

wwind123 commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wwind123 commented Jun 11, 2026

Summary

Test plan

Uh oh!

wwind123 commented Jun 11, 2026

Blocking issues

Same-PR follow-ups

Uh oh!

wwind123 commented Jun 11, 2026

Blocking issues

Uh oh!

wwind123 commented Jun 11, 2026

Same-PR follow-ups

Prior unresolved item dispositions

Uh oh!

wwind123 commented Jun 11, 2026

Blocking issues

Same-PR follow-ups

Prior unresolved item dispositions

Uh oh!

wwind123 commented Jun 11, 2026

Prior unresolved item dispositions

Uh oh!

wwind123 commented Jun 11, 2026

Prior unresolved item dispositions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant