Skip to content

feat(skill): add skill_runner to replace per-step manual orchestration#288

Merged
wwind123 merged 3 commits into
mainfrom
feat/287-skill-runner
Jun 11, 2026
Merged

feat(skill): add skill_runner to replace per-step manual orchestration#288
wwind123 merged 3 commits into
mainfrom
feat/287-skill-runner

Conversation

@wwind123

Copy link
Copy Markdown
Owner

Summary

  • Adds helpers/skill_runner.py with run-plan-round and run-pr-round subcommands that encapsulate the full review-round pipeline in a single CLI call (build-resume, plan validation, AGENT_LOOP_META attachment, GitHub posting, reviewer invocation, response validation/rendering, session-state update)
  • Adds helpers/prompt_builders.py wrapping build_plan_review_prompt / build_review_prompt for use without manual AgentLoopConfig construction
  • Extends state_manager build-resume descriptor with completed_reviewer_data, completed_round_number, and current_plan_subject fields needed by the ledger-transition algorithm
  • Adds --flow {plan,pr} to run_external so PR dry-run stubs use AGENT_STATE/pr_review markers
  • Exposes apply_item_dispositions as a public wrapper in unresolved_items.py for the round-to-round ledger transition
  • Collapses SKILL.md from 8 manual steps to 3 (write plan → run skill_runner → read JSON result)
  • Adds TestSkillRunner tests with fake-gh isolation (no real GitHub calls)

Closes #287.

Test plan

  • pytest tests/test_skill_helpers.py — 16 passed
  • Dry-run run-plan-round exits 0 and returns valid JSON result
  • Invalid plan file causes non-zero exit
  • Dry-run does not invoke gh for posting (fake gh fails loudly on comment calls)

-- Anthropic Claude

closes #287)

Introduces helpers/skill_runner.py (run-plan-round / run-pr-round subcommands)
that encapsulate the full review-round pipeline — build-resume, plan validation,
AGENT_LOOP_META attachment, GitHub posting, reviewer invocation, response
validation/rendering, and session-state update — in a single CLI call.

Supporting changes:
- helpers/prompt_builders.py: new module wrapping build_plan_review_prompt and
  build_review_prompt for use without constructing AgentLoopConfig manually
- helpers/state_manager.py: extend build-resume descriptor with
  completed_reviewer_data, completed_round_number, and current_plan_subject
- helpers/run_external.py: add --flow {plan,pr} flag; pr dry-run stub uses
  AGENT_STATE / pr_review instead of plan-review markers
- src/coding_review_agent_loop/unresolved_items.py: expose
  apply_item_dispositions as public wrapper for skill_runner ledger transition
- SKILL.md: collapse 8-step manual orchestration into 3 steps backed by skill_runner
- tests/test_skill_helpers.py: add TestSkillRunner with fake-gh isolation and
  TestRunExternal.test_dry_run_flow_pr_writes_pr_review_stub

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wwind123

Copy link
Copy Markdown
Owner Author

Review verdict: Blocking

Introduces skill_runner.py for automated orchestration, which significantly simplifies the review process. However, there are blocking issues in review item ID generation and the logic for avoiding duplicate comment posts during resume.

Blocking issues

  • Item ID collisions in _run_reviewer: item_counter starts at len(next_prior_items_raw) + 1. Since next_prior_items_raw only includes unresolved items, its length decreases when items are resolved in previous rounds, leading to ID collisions in subsequent rounds. It should instead determine the next ID by finding the maximum numeric suffix among all existing items (prior and current) or using a persistent counter.
  • Duplicate comment check in _reconcile_pending_comment is flawed: already_posted = body_text in all_bodies compares a tagged body (containing AGENT_LOOP_META) against what are likely cleaned bodies (without metadata) returned by _extract_round_metadata_records. This will lead to duplicate posts of the same comment during session resume if the 'clear' step previously failed.

Same-PR follow-ups

  • Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested).
  • Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure instead of logging an error or raising, which could lead to empty-diff reviews without explanation.
  • Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid confusion when it stores a PR head SHA.

-- Google Gemini

@wwind123

Copy link
Copy Markdown
Owner Author

Review verdict: Blocking

The new skill_runner has resume/aggregation bugs that can duplicate posted plans and falsely report approval while same-round followups remain unresolved.

Blocking issues

  • helpers/skill_runner.py:497: The plan-posting gate uses if not local_completed, so resuming a round after the plan comment was posted but before any reviewer completed reposts the same plan every time. In the resume path current_plan_subject == plan_subject and completed_reviewer_names can legitimately be empty; this should skip coder-plan posting and only run pending reviewers.
  • helpers/skill_runner.py:367: Same-round followups are serialized as unresolved same-plan/same-pr items, but _run_reviewer returns only blocking_texts, and the aggregate result only extends round_blocking_items with that field. A reviewer response with state: blocking and only same_plan_followups/same_pr_followups therefore produces state: approved with empty blocking_items, losing required before-merge work.
  • helpers/skill_runner.py:142: Pending-comment reconciliation fetches comment bodies with --jq .[].body and then splits stdout on every newline, which breaks multiline comments containing AGENT_LOOP_META. _extract_round_metadata_records will not see the original comment bodies, so a crash after posting but before clearing the pending marker can repost an already-posted comment instead of clearing it.

-- OpenAI Codex

- Fix item ID collisions: use _max_item_number() to compute item_id_offset
  per reviewer from all seen items (prior + current-round), eliminating
  duplicate IDs when multiple reviewers run in the same round or items are
  resolved between rounds
- Fix plan re-post on resume: gate plan posting on is_new_round (subject
  mismatch), not on local_completed being empty; resuming a round where no
  reviewer has completed no longer re-posts the plan
- Fix same-round followups lost when state is blocking: if a reviewer blocks
  via same_plan_followups/same_pr_followups only (no explicit blocking_items),
  surface those as blocking_items for the caller; derive overall_state from
  any_reviewer_blocked rather than round_blocking_items length
- Fix pending-comment duplicate detection: compare body_text directly against
  raw GitHub comment bodies (stripped) instead of via _extract_round_metadata_records
- Fix comment fetch for multiline bodies: replace gh api --jq .[].body (splits
  on newlines) with gh issue view --json comments (parses JSON correctly);
  update fake gh in tests accordingly
- Fix Gemini prose/duplicate-marker normalization: _normalize_raw_response now
  also strips leading prose before the JSON block

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wwind123

Copy link
Copy Markdown
Owner Author

Review verdict: Blocking

The PR successfully implements skill_runner.py to automate the review round pipeline, significantly simplifying the orchestration steps in SKILL.md. It also addresses previous blocking items regarding item ID collisions and duplicate comment detection. However, several follow-up items from round 1 remain unaddressed, including test coverage for the PR flow, improved error handling for diff fetching, and renaming of ambiguous variables.

Same-PR follow-ups

  • Add a test case for run-pr-round in tests/test_skill_helpers.py to ensure the PR review flow is verified in isolation (item-3).
  • Improve error handling in _fetch_pr_diff: instead of returning an empty string on failure, log an error or raise an exception to prevent reviewers from receiving empty prompts (item-4).
  • Rename plan_subject in _run_reviewer (and associated call sites) to round_subject or similar, to clarify that it may contain a PR head SHA (item-5).

Prior unresolved item dispositions

  • [item-1] Blocking issue from Google Gemini, round 1: Item ID collisions in _run_reviewer: item_counter starts at len(next_prior_items_raw) + 1.... -> resolved
  • [item-2] Blocking issue from Google Gemini, round 1: Duplicate comment check in _reconcile_pending_comment is flawed: `already_posted = body_text in... -> resolved
  • [item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> same-pr
  • [item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> same-pr
  • [item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> same-pr

-- Google Gemini

@wwind123

Copy link
Copy Markdown
Owner Author

Review verdict: Blocking

The runner addresses the earlier ID-collision and pending-comment issues, but new-round result reconstruction still mixes prior-round reviewer records into the active round.

Blocking issues

  • helpers/skill_runner.py: In both cmd_run_plan_round and cmd_run_pr_round, the Step 5 reconstruction loop always consumes resume.completed_reviewer_data even when current_plan_subject/head_sha differs and the runner has already classified the invocation as a new round. Those records belong to the completed previous round and are only meant to feed _compute_next_prior_items. As written, any previous blocking reviewer sets any_reviewer_blocked=True and contributes old blocking_items to the new round result, while previous approvals are also reported as approved_reviewers for the new round. A revised plan or updated PR can therefore still return state=blocking after all current reviewers approve, which breaks the core round-to-round workflow.

Same-PR follow-ups

  • Add a direct TestSkillRunner coverage path for helpers.skill_runner run-pr-round; the new PR-flow test only exercises helpers.run_external --flow pr, so the orchestration path remains untested.
  • Improve _fetch_pr_diff error handling so gh pr diff failures are surfaced instead of silently returning an empty diff.
  • Rename the _run_reviewer plan_subject parameter or split it by flow, since it still carries a PR head SHA in PR mode.

Prior unresolved item dispositions

  • [item-1] Blocking issue from Google Gemini, round 1: Item ID collisions in _run_reviewer: item_counter starts at len(next_prior_items_raw) + 1.... -> resolved
  • [item-2] Blocking issue from Google Gemini, round 1: Duplicate comment check in _reconcile_pending_comment is flawed: `already_posted = body_text in... -> resolved
  • [item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> same-pr
  • [item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> same-pr
  • [item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> same-pr

-- OpenAI Codex

- Fix Step 5 new-round contamination: gate completed_reviewer_data reconstruction
  on `not is_new_round` in both plan and PR commands; previously previous-round
  blocking records polluted the new round's result, preventing approval even after
  all current-round reviewers approved
- Fix _fetch_pr_diff silent failure: exit with error instead of returning empty
  string, preventing silent empty-diff review rounds
- Rename plan_subject → round_subject in _run_reviewer to reflect dual use
  (plan sha256 or PR head SHA)
- Add _normalize_disposition_values: fix natural-language variants like
  'still blocking' → 'blocking' before validation to handle Gemini quirks
- Add test_run_pr_round_dry_run_exits_zero to TestSkillRunner

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wwind123

Copy link
Copy Markdown
Owner Author

Review verdict: Approved

PR #288 successfully introduces skill_runner.py to automate review rounds, addressing all prior feedback. Tests for PR rounds are included, error handling is hardened, and the round reconstruction logic is corrected to prevent contamination across rounds.

Prior unresolved item dispositions

  • [item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> resolved
  • [item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> resolved
  • [item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> resolved
  • [item-6] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in tests/test_skill_helpers.py to ensure the PR review flow... -> resolved
  • [item-7] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: instead of returning an empty string on failure, log... -> resolved
  • [item-8] Same-PR follow-up from Google Gemini, round 1: Rename plan_subject in _run_reviewer (and associated call sites) to round_subject or simila... -> resolved
  • [item-9] Blocking issue from OpenAI Codex, round 1: helpers/skill_runner.py: In both cmd_run_plan_round and cmd_run_pr_round, the Step 5 reconstructi... -> resolved
  • [item-10] Same-PR follow-up from OpenAI Codex, round 1: Add a direct TestSkillRunner coverage path for helpers.skill_runner run-pr-round; the new PR-flow... -> resolved
  • [item-11] Same-PR follow-up from OpenAI Codex, round 1: Improve _fetch_pr_diff error handling so gh pr diff failures are surfaced instead of silently ret... -> resolved
  • [item-12] Same-PR follow-up from OpenAI Codex, round 1: Rename the _run_reviewer plan_subject parameter or split it by flow, since it still carries a PR... -> resolved

-- Google Gemini

@wwind123

Copy link
Copy Markdown
Owner Author

Review verdict: Approved

Current head addresses the prior skill_runner orchestration issues and I found no remaining merge-blocking defects in the reviewed diff.

Prior unresolved item dispositions

  • [item-3] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in TestSkillRunner (currently only run-plan-round is tested). -> resolved
  • [item-4] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: it currently returns an empty string on failure inste... -> resolved
  • [item-5] Same-PR follow-up from Google Gemini, round 1: Consider renaming plan_subject in _run_reviewer to something like round_subject to avoid co... -> resolved
  • [item-6] Same-PR follow-up from Google Gemini, round 1: Add a test case for run-pr-round in tests/test_skill_helpers.py to ensure the PR review flow... -> resolved
  • [item-7] Same-PR follow-up from Google Gemini, round 1: Improve error handling in _fetch_pr_diff: instead of returning an empty string on failure, log... -> resolved
  • [item-8] Same-PR follow-up from Google Gemini, round 1: Rename plan_subject in _run_reviewer (and associated call sites) to round_subject or simila... -> resolved
  • [item-9] Blocking issue from OpenAI Codex, round 1: helpers/skill_runner.py: In both cmd_run_plan_round and cmd_run_pr_round, the Step 5 reconstructi... -> resolved
  • [item-10] Same-PR follow-up from OpenAI Codex, round 1: Add a direct TestSkillRunner coverage path for helpers.skill_runner run-pr-round; the new PR-flow... -> resolved
  • [item-11] Same-PR follow-up from OpenAI Codex, round 1: Improve _fetch_pr_diff error handling so gh pr diff failures are surfaced instead of silently ret... -> resolved
  • [item-12] Same-PR follow-up from OpenAI Codex, round 1: Rename the _run_reviewer plan_subject parameter or split it by flow, since it still carries a PR... -> resolved

-- OpenAI Codex

@wwind123 wwind123 merged commit c81f84c into main Jun 11, 2026
1 check passed
@wwind123 wwind123 deleted the feat/287-skill-runner branch June 11, 2026 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: skill-run.sh wrapper script for end-to-end review orchestration

1 participant