feat: Code Tour — guided PR walkthrough as a third agent provider by backnotprop · Pull Request #569 · backnotprop/plannotator

backnotprop · 2026-04-15T19:23:28Z

Summary

Started as Code Tour, a third agent provider that generates a guided walkthrough of a PR — greeting, intent, before/after, key takeaways, ordered stops with inline diff anchors, and a QA checklist, all rendered in an animated three-page overlay. Grew over the course of review into a broader pass on the review-agent layer: shared session factory, per-provider per-model controls, cookie-backed settings persistence, a docs reference page, and a string of P2/P3 fixes from self-review and PR review.

What shipped

Code Tour (the headline feature)

New tour-review.ts with prompt + JSON schema + per-engine command builders (buildTourClaudeCommand, buildTourCodexCommand) and parsers. Codex emits to a file via --output-schema; Claude streams JSONL via stdin.
A createTourSession() factory owns the tour lifecycle (buildCommand, onJobComplete, getTour, saveChecklist) so both the Bun hook server and the Pi Node server wire it with ~25 lines of route glue instead of ~100 lines of duplicated provider-branch logic.
Route parity: GET /api/tour/:jobId and PUT /api/tour/:jobId/checklist work on both servers; tests enforce it.
Three-page animated dialog (Overview / Walkthrough / Checklist), spring-driven accordions, intro composition cascade, pinned Start Tour button with a bottom fade, single-open accordion with dim-others, dev-only demo tour (Cmd/Ctrl+Shift+T).
Auto-opens when a tour job reaches a terminal state. If the model returns empty or malformed output, the job flips to failed with a clear error instead of silently 404ing the dialog.
Under prefers-reduced-motion, page navigation swaps directly instead of waiting on an onAnimationEnd that never fires (would otherwise soft-lock the walkthrough behind the intro).

Per-provider, per-model review controls

Claude review: --model + --effort.
Codex review: -m + -c model_reasoning_effort=... + -c service_tier=fast.
Code Tour: both engines, each with their own model + effort/reasoning settings.
Settings dropdowns removed the hidden "Default" option in favor of explicit sensible defaults (Opus 4.7 / High for Claude, GPT-5.3 Codex / High for Codex, Sonnet / Medium for Tour Claude, GPT-5.3 Codex / Medium for Tour Codex).
Dropped the invalid none reasoning option (codex-rs rejects it); included a one-shot cookie migration so existing users don't keep launching a broken flag.

Cookie-backed settings persistence

Single plannotator.agents cookie holds the whole settings tree, keyed per (agent × model) so switching models reveals the effort/reasoning/fast-mode you last used with that specific model.
React state is the authority; the cookie is a mirror. All mutations funnel through a single owner via functional setState, so rapid successive changes can't stale-read or lose writes.

UI polish

Job card badge now carries the full model + settings story: Claude · Opus 4.7 · High, Codex · GPT-5.3 Codex · Medium · Fast, Tour · Claude · Sonnet · Medium. Titles collapsed to plain Code Review / Code Tour since the provider identity now lives in the badge.
Main dropdown options read action-first: Code Review · Claude, Code Review · Codex, Code Tour.
Pending checklist saves are flushed with keepalive: true on unmount so closing the dialog within the 500 ms debounce window no longer drops the save.
Claude-engine tour logs now render through formatClaudeLogEvent (previously they streamed raw JSONL because the formatter only keyed on provider === "claude").
Tour Claude allowlist permits reading linked GitHub/GitLab issues (gh issue view, gh api repos/*/*/issues/*, glab issue view) so the prompt's Fixes #123 follow-through actually works.

Documentation

New /docs/reference/prompts/ page explains the three-layer message shape (system prompt owned by the CLI, user message = review prompt + user prompt joined with \n\n---\n\n, JSON schema as a terminal constraint). Calls out that the Claude and Codex review prompts are upstream-derived, not ours — only Tour's prompt is a Plannotator original.
Code Review page links to it under "How review agents prompt the CLI."

Test plan

Notes

Adds motion@12.38.0 (~30 KB gzipped) for the tour dialog's spring physics.
Claude and Codex review prompts are upstream-derived; see the new prompts reference for exact sourcing.
Two components drafted and superseded during the tour work (TourHeader.tsx, DiffAnchorChip.tsx) were removed in the final commit of the initial tour set.

For provenance purposes, this work was AI assisted.

Adds a "tour" provider alongside "claude" and "codex" that generates a guided walkthrough of a changeset from a product-minded colleague's perspective. Reuses the entire agent-jobs infrastructure (process lifecycle, SSE broadcasting, live logs, kill support, capability detection) so the only new server plumbing is the prompt/schema module, the GET endpoint for the result, and checklist persistence. - tour-review.ts: prompt, JSON schema, Claude (stream-json) and Codex (file output) command builders, and parsers for both. Prompt frames the agent as a colleague giving a casual tour, orders stops by reading flow (not impact), chunks by logical change (not per-file), and writes QA questions a human can answer by reading code or using the product. - review.ts: tour buildCommand with per-engine model defaults (fixes a bug where Codex was defaulting to "sonnet", a Claude model), an onJobComplete handler that parses and stores the tour, plus GET /api/tour/:jobId and PUT /api/tour/:jobId/checklist endpoints. - agent-jobs.ts: tour capability detection and config threading from the POST body through to the provider's buildCommand. - shared/agent-jobs.ts: engine/model fields on AgentJobInfo so the UI can render tour jobs with their chosen provider + model. For provenance purposes, this commit was AI assisted.

Adds the tour overlay surface: a full-screen dialog with three animated pages (Overview, Walkthrough, Checklist) that renders the CodeTourOutput from the server-side agent. Opens automatically when a tour job completes and can be dismissed with Escape or backdrop click. - components/tour/TourDialog.tsx: three-page animated dialog. Intro page uses a composition cascade (greeting, Intent/Before/After cards with one-shot color-dot pulse, key takeaways table) with a pinned Start Tour button and a bottom fade so content scrolls cleanly under it. Page slides between Overview/Walkthrough/Checklist are ease-out springs with direction auto-derived from tab index. - components/tour/TourStopCard.tsx: per-stop accordion using motion's AnimatePresence for height + opacity springs with staggered children for detail text and anchor blocks. Anchors lazy-mount DiffHunkPreview on first open to keep mount costs low. Callout labels (Important, Warning, Note) are deterministic typography, no emoji. - components/tour/QAChecklist.tsx: always-open list of verification questions with per-item spring entrance and persistent checkbox state that PUTs to the server. - hooks/useTourData.ts: fetch + checklist persistence with a dev-mode short-circuit when jobId === DEMO_TOUR_ID so UI iteration doesn't require a live agent run. - demoTour.ts: realistic demo data (multi-stop, multiple takeaways, real diff hunks) for dev iteration. - App.tsx: tour dialog state, auto-open on job completion, and a dev-only "Demo tour" floating button + Cmd/Ctrl+Shift+T shortcut guarded by import.meta.env.DEV. - index.css: all tour animations (dialog enter/exit, page slides, stop reveal cascade, intro exit) with reduced-motion fallbacks. Dark-mode contrast tuning across structural surfaces (borders, surface tints, callout backgrounds) so the design works in both themes. - package.json: adds motion@12.38.0 for spring-driven accordion physics and the intro composition cascade. For provenance purposes, this commit was AI assisted.

…unkPreview Connects the tour provider to the existing agent jobs sidebar and detail panel so the user can launch a tour the same way they launch Claude or Codex reviews. Also tightens DiffHunkPreview (used inside tour anchors) to render correctly on first mount and handle every diff hunk shape the agent might produce. - ui/components/AgentsTab.tsx: tour engine (Claude/Codex) + model select when launching a tour job. Resets model to blank when falling back to Codex-only so Codex uses its own default instead of inheriting "sonnet" (a Claude model). - ui/hooks/useAgentJobs.ts: forward engine/model config from the UI through to the POST /api/agents/jobs body so the server's tour provider can pick the right command shape. - dock/panels/ReviewAgentJobDetailPanel.tsx: tour-aware detail panel. Replaces the "Findings" tab with a "Status" card for tour jobs and surfaces an "Open Tour" button that opens the dialog overlay when the tab is already in the dock. - components/DiffHunkPreview.tsx: synchronous pierre theme init via a useState lazy initializer so the first render (inside a tooltip) is already themed; robust hunk parser that handles bare @@ hunks, file- level --- hunks, and full git diffs; "Diff not available" fallback for broken hunks. - components/ReviewSidebar.tsx, dock/ReviewStateContext.tsx: small wiring changes to expose openTourPanel from the review state context. Also removes two dead components (TourHeader, DiffAnchorChip) that were superseded by the inline title bar in TourDialog and the inline AnchorBlock in TourStopCard. For provenance purposes, this commit was AI assisted.

…ession factory + Pi parity The route-parity test suite requires the Pi server (apps/pi-extension/) to expose the same routes as the Bun server (packages/server/). After Code Tour shipped in the prior 3 commits, Pi was missing /api/tour/:jobId (GET) and /api/tour/:jobId/checklist (PUT). A naive mirror would duplicate ~100 lines of provider-branch logic (buildCommand, onJobComplete, in-memory maps) into Pi's serverReview.ts, perpetuating the existing claude/codex duplication problem. Instead, extract the pure runtime-agnostic tour lifecycle into a createTourSession() factory that both servers consume. Route handlers stay per-server (different HTTP primitives) but are ~5 lines each. Net effect: Pi port is ~25 lines instead of ~100. Future providers that adopt the same pattern cost ~15 lines per server. - tour-review.ts: new createTourSession() at the bottom of the module. Encapsulates tourResults + tourChecklists maps, buildCommand (with the Claude-vs-Codex model-default fix baked in), onJobComplete (parse/store/summarize), plus getTour/saveChecklist lookup helpers for route handlers. - review.ts (Bun): tour branch in buildCommand, tour branch in onJobComplete, and both route handlers collapse to one-line calls into the factory. Drops ~70 lines. - vendor.sh: add tour-review to the review-agent loop so Pi regenerates generated/tour-review.ts on every build:pi. - serverReview.ts (Pi): import createTourSession from ../generated/tour-review.js; add tour branch to buildCommand (one line), tour branch to onJobComplete (three lines), and GET/PUT route handlers using Pi's json() helper. ~25 lines added. - agent-jobs.ts (Pi): extend buildCommand interface to accept config and return engine/model; thread config from POST body; extend spawnJob to persist engine/model on AgentJobInfo; add tour to capability list. Claude and Codex branches are intentionally left in the old pattern; they can migrate to the factory approach when next touched to keep this change's blast radius contained. Tests: 518/518 passing (previous 3 route-parity failures resolved, plus 2 extra assertions passing since tour is now in both servers' route tables). For provenance purposes, this commit was AI assisted.

…setOpen shim Two small cleanups surfaced by a self-review pass: - Pi's GET /api/tour/:jobId route used `endsWith("/checklist")` to block the checklist sub-route, while Bun uses `includes("/", ...)`. The two are not equivalent: a URL like /api/tour/abc/extra would be accepted by Pi (jobId becomes "abc/extra") but correctly rejected by Bun. Align Pi to Bun's pattern. - TourStopCard had a leftover `setOpen` shim from when open state was local to the card. State is now lifted to TourDialog, so the shim just aliases onToggle and ignores its argument. Replace with a direct `onClick={onToggle}` on the trigger button. 520/0 tests still pass. For provenance purposes, this commit was AI assisted.

…agents Exposes model, effort/reasoning, and fast-mode (Codex) as per-job knobs for Claude, Codex, and Code Tour via the Agents tab. Threads the config from the UI through the POST body into each provider's buildCommand, which emits the corresponding CLI flags (--model/--effort for Claude; -m / -c model_reasoning_effort / -c service_tier for Codex). Bun and Pi servers stay in parity. UI: option catalogs (CLAUDE_MODELS, CLAUDE_EFFORT, CODEX_MODELS, CODEX_REASONING, TOUR_CLAUDE_MODELS) are defined once and mapped into every dropdown so there's a single source of truth for the choices. Tour's Claude/Codex settings are kept separate from the standalone provider's state so toggling the tour engine no longer overwrites the provider's last choice. Job badge now surfaces the model/reasoning/fast selection for both Tour and the standalone Codex provider. For provenance purposes, this commit was AI assisted.

Documents the three-layer structure every review call travels through (CLI system prompt, our user message of review-prompt + user-prompt, output schema flag), names each constant + its file, and calls out that the Claude/Codex review prompts are the upstream ones from those projects (only Code Tour's prompt is original to Plannotator). Linked from the Code Review command page. For provenance purposes, this commit was AI assisted.

Drops the hidden "Default" options from the agent dropdowns, locks in explicit sensible defaults (Opus 4.7/High for Claude review, gpt-5.3-codex/High for Codex, Sonnet/Medium for Tour Claude, gpt-5.3-codex/Medium for Tour Codex), and remembers the last-used effort/reasoning/fast-mode per (agent job × model) so switching models reveals the choices you made last time. Backed by a single `plannotator.agents` cookie holding the whole settings tree — one read on mount, one mirror write per change, all mutations funnel through a single React state owner to avoid stale-read or lost-write races across rapid successive updates. For provenance purposes, this commit was AI assisted.

…on close Two P2 issues surfaced in review: - Under prefers-reduced-motion, tour page animations are suppressed, so onAnimationEnd never fires and exitingPage stuck on 'intro' kept the walkthrough/checklist gated out. navigate() now swaps pages directly when reduced motion is on, mirroring the pattern already used in the wrapper. - Checklist toggles are debounced 500ms before the PUT, but unmount only cleared the timer — checking an item and closing within the window dropped the save. Cleanup now flushes the pending payload with keepalive: true. For provenance purposes, this commit was AI assisted.

…ant labels Claude effort was never persisted on AgentJobInfo, so the job badge had nothing to render beyond "Claude". Plumbs `effort` through the shared type, both server build-command pipelines (Bun + Pi), and spawnJob, then teaches the ProviderBadge to display Claude and Tour Claude model + effort with the same shape Codex already had. Labels resolve via the dropdown catalogs so the badge shows "Opus 4.7" / "High" instead of raw ids. Server labels for both Claude and Codex reviews collapse to plain "Code Review" (matching tour's "Code Tour"), since the badge now carries the provider + model + settings — the title only needs to name the action. For provenance purposes, this commit was AI assisted.

The Run launcher listed "Claude Code", "Codex CLI", and "Code Tour" side by side — the CLI's detection name was doing double duty as the action label. Prefixes the review entries with "Code Review · " so scanning three options surfaces two reviews + one tour instead of three raw provider names. For provenance purposes, this commit was AI assisted.

… output Two P2 issues from PR review: - Tour jobs exited 0 but returned null output would still be marked "done", and the auto-open watcher would greet the user with a 404 "Tour not found" dialog. onJobComplete now flips the job to failed with an error message when nothing was stored, so the card reflects the real state. - The codex reasoning dropdown offered "None", but codex-rs only accepts minimal/low/medium/high/xhigh. Picking None sent `-c model_reasoning_effort=none` and launched a broken job. Removed the option; added a one-shot cookie migration so users who already saved "none" don't keep shipping it. For provenance purposes, this commit was AI assisted.

Two P2/P3 issues from PR review: - The stdout formatter keyed only on provider === "claude", so Code Tour jobs running on the Claude engine streamed raw JSONL to the log panel while Claude review jobs got formatted text. Widened the check to also catch spawnOptions.engine === "claude" and mirrored the fix into Pi. - The tour Claude allowlist permitted PR/MR commands but not issue reads, yet the prompt explicitly asks the agent to open `Fixes #123` / `Closes owner/repo#456` targets for deeper context. Claude was being denied those commands mid-tour. Added gh issue view + gh api issues + glab issue view to the allowlist. For provenance purposes, this commit was AI assisted.

- Pi now logs Claude parse failures with the same diagnostic as Bun - Tour route match uses a regex instead of an includes-offset trick - Drop `any` cast in Pi's Codex blocking-findings predicate - Extract `patchClaude` twin of `patchCodex` in useAgentSettings - Factor `resolveAgentCwd` helper in Bun review to match Pi - AgentsTab launch payload becomes a per-provider dispatch table - Wrap TourDialog in MotionConfig reducedMotion="user" so motion.* children honor prefers-reduced-motion alongside the CSS keyframes - Tests for parseTourStreamOutput/parseTourFileOutput and the Codex perModel sanitizer For provenance purposes, this commit was AI assisted.

The parsers don't validate stop fields so the original made-up shape passed fine, but future readers would be misled. Use the real CodeTourOutput / TourStop / TourDiffAnchor shapes from tour-review.ts. For provenance purposes, this commit was AI assisted.

backnotprop added 15 commits April 15, 2026 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Code Tour — guided PR walkthrough as a third agent provider#569

feat: Code Tour — guided PR walkthrough as a third agent provider#569
backnotprop wants to merge 15 commits intomainfrom
feat/review-tour

backnotprop commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

backnotprop commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What shipped

Code Tour (the headline feature)

Per-provider, per-model review controls

Cookie-backed settings persistence

UI polish

Documentation

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

backnotprop commented Apr 15, 2026 •

edited

Loading