Restructure planner prompt for clarity and stricter retry rules by droid-ash · Pull Request #84 · final-run/finalrun-agent

droid-ash · 2026-04-12T01:08:41Z

Summary

Reorganize the goal-executor planner prompt into clearly scoped sections: turn loop, output schema, verification logic, stagnation decision tree, test phases, and screen/popup protocols.
Tighten retry and stagnation accounting with bounded per-case budgets (A–E) and explicit same-target identity rules so reworded descriptions can't reset counters.
Codify positional assertions, popup handling, and the remember protocol as standalone sections instead of inline core principles.

Test plan

Run an existing goal-executor test case end-to-end and confirm planner output still conforms to the JSON schema.
Verify the stagnation decision tree triggers Case E failure after 3 bounded attempts on an unresponsive element.
Spot-check a positional assertion scenario (e.g. element expected in toolbar but rendered elsewhere) fails as expected.

🤖 Generated with Claude Code

Summary by CodeRabbit

Improvements
- More reliable action execution with structured per-turn planning and clearer retry/resolution flows.
- Better detection of unchanged screen states and automated retry/stagnation handling to reduce failed attempts.
- Stronger terminal-state handling for clearer completion and status reporting.
Documentation
- Prompt and behavior guidelines clarified to improve consistency of executor decisions.

coderabbitai · 2026-04-12T01:08:58Z

📝 Walkthrough

Walkthrough

The planner prompt was restructured from principle-based sections to a schema-driven turn protocol with explicit output schemas, sequential turn loops, screenshot-delta stagnation detection, and a bounded retry decision tree; Phase 3 is now terminal observation-only with immediate status emission.

Changes

Cohort / File(s)	Summary
Planner prompt (documentation) `packages/goal-executor/src/prompts/planner.md`	Rewrote execution guidance to a schema-driven per-turn protocol: added `<turn_loop>` and `<output_schema>`, revised action catalog/output formats, introduced pre/post screenshot comparison and `<stagnation_and_retries>` flow, tightened terminal Phase 3 behavior, and consolidated rules for positioning, popups/errors, screen stabilization, secrets/platform/app-knowledge, quoted-string exactness, ghost elements, and Same-Target Identity retry accounting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Update planner to verify screenshot first. #54: Modifies the same planner.md to enforce screenshot-first validation and aligns with the screenshot-delta verification changes here.
Skill: avoid verifying ephemeral UI in generated tests #83: Adjusts UI outcome verification guidance (ephemeral vs persistent signals) and relates to the updated post-action verification and popup/error handling here.

Suggested labels

enhancement

Suggested reviewers

arnoldlaishram

Poem

🐰 I hop through prompts both new and neat,
Schemas guide each careful beat,
Screenshots whisper change or pause,
Retries bounded by clear laws,
A planner tuned—my carrot treat! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Restructure planner prompt for clarity and stricter retry rules' directly and accurately describes the main changes: reorganization of the planner prompt structure and tightened retry/stagnation rules.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch rewrite-planner-prompt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

packages/goal-executor/src/prompts/planner.md (1)

191-193: Reword repeated sentence openings for readability.

Lines 191–193 start three consecutive bullets with “Do not”. This is minor, but a small rephrase improves scanability of the prohibition list.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/src/prompts/planner.md` around lines 191 - 193, Three
consecutive bullets starting with "Do **not**" (the lines beginning "Do **not**
substitute `long_press` for `tap`...", "Do **not** open an overflow menu...",
and "Do **not** re-describe the target...") should be reworded for readability
so they don't all start the same way; update those three bullet items to use
varied openings like "Avoid substituting `long_press` for `tap`...", "Do not
open an overflow menu..." (or "Avoid opening an overflow menu..."), and "Avoid
re-describing the target..." while preserving the existing prohibitions and
examples so the semantics and references to `{testCase}`, overflow/menu
behavior, and Same-Target Identity remain unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 274-280: The documentation conflicts between the global "Budget: 2
dismissal actions per blocker type across the entire test" rule and the local
"Handle per Stagnation Case B: 1 dismissal + 1 retry, then FAIL" rule; update
planner.md to resolve precedence: state that Stagnation Case B is a stricter,
local policy for obstructions occurring during a specific action and therefore
still counts against the global per-test dismissal budget (or explicitly mark
Case B as an exception), and add a sentence clarifying how recurring
popup-on-target scenarios increment the global counter for the blocker type and
how the two-rule interaction leads to final FAIL (e.g., Case B triggers
immediate FAIL if its local retries are exhausted even if the global budget
remains, or vice versa — choose and document one consistent precedence).
- Around line 163-167: The verification text incorrectly references an
unsupported action name "delete" (seen alongside "input_text" and "tap") which
is not defined in the <actions> schema; either remove/rename the "delete"
mention to one of the supported action tokens (e.g., use the established verb
for item removal in your schema such as "remove" or "item_removed") or
explicitly add "delete" to the <actions> definitions so the planner stays
on-schema—update the sentence at the list entry (the line that currently starts
with "* delete →") to use the correct action token and matching wording for
verification.

---

Nitpick comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 191-193: Three consecutive bullets starting with "Do **not**" (the
lines beginning "Do **not** substitute `long_press` for `tap`...", "Do **not**
open an overflow menu...", and "Do **not** re-describe the target...") should be
reworded for readability so they don't all start the same way; update those
three bullet items to use varied openings like "Avoid substituting `long_press`
for `tap`...", "Do not open an overflow menu..." (or "Avoid opening an overflow
menu..."), and "Avoid re-describing the target..." while preserving the existing
prohibitions and examples so the semantics and references to `{testCase}`,
overflow/menu behavior, and Same-Target Identity remain unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd21fdbb-9e0f-430c-8fe7-976374e6c815

📥 Commits

Reviewing files that changed from the base of the PR and between 33779e3 and 9c4d4c9.

📒 Files selected for processing (1)

packages/goal-executor/src/prompts/planner.md

coderabbitai · 2026-04-12T01:10:53Z

packages/goal-executor/src/prompts/planner.md

+**2. Partial change (low delta).** Same layout, but a specific element changed — text appeared in a field, a toggle flipped, an item disappeared, a keyboard opened. → Verify the specific target:
+* `input_text` → does the field contain exactly what you typed (or masked dots for passwords)? If empty or wrong, correct it before proceeding.
+* `tap` → did the expected state change occur (button pressed, checkbox toggled, item selected)?
+* delete → is the item gone?
+* business logic → did the rule apply correctly (not just "did something change")? If you changed the language, did the menu text actually change to the new language?


⚠️ Potential issue | 🟠 Major

Unsupported delete action is referenced in verification logic.

Line 166 introduces delete as if it were an action type, but <actions> does not define it. This can push the planner off-schema.

Suggested wording fix

-* delete → is the item gone? +* delete-step outcome → if the prior step intended deletion (typically via `tap`/`long_press` + confirm), verify the item is gone.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/goal-executor/src/prompts/planner.md` around lines 163 - 167, The verification text incorrectly references an unsupported action name "delete" (seen alongside "input_text" and "tap") which is not defined in the <actions> schema; either remove/rename the "delete" mention to one of the supported action tokens (e.g., use the established verb for item removal in your schema such as "remove" or "item_removed") or explicitly add "delete" to the <actions> definitions so the planner stays on-schema—update the sentence at the list entry (the line that currently starts with "* delete →") to use the correct action token and matching wording for verification.

packages/goal-executor/src/prompts/planner.md

Reorganizes the planner prompt into clearly scoped sections (turn loop, output schema, verification logic, stagnation decision tree, phases, protocols) and tightens retry/stagnation accounting with bounded budgets per failure case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

packages/goal-executor/src/prompts/planner.md (1)

163-167: ⚠️ Potential issue | 🟠 Major

Unsupported delete token is still referenced in verification logic.

delete is not in the <actions> catalog, so this wording can push the planner off-schema and fail normalization/parsing.

Suggested wording fix

-* delete → is the item gone?
+* delete-step outcome → if the prior step intended deletion (typically via `tap`/`long_press` + confirm), verify the item is gone.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/src/prompts/planner.md` around lines 163 - 167, The
verification text references an unsupported action token `delete`, which can
break planner normalization; update the sentence in planner.md that lists
`delete` (the bullet under "2. Partial change (low delta)" with the
`input_text`/`tap`/`delete` bullets) to use a supported action name from the
<actions> catalog (e.g., replace `delete` with the canonical token such as
`remove` or `remove_item`) or rephrase it as plain language ("is the item
gone?") so it does not inject an unknown token into the planner schema; ensure
the changed wording matches the existing <actions> tokens used by the planner.

🧹 Nitpick comments (1)

packages/goal-executor/src/prompts/planner.md (1)
204-208: Identity rule references unavailable resource id metadata.

<inputs> says hierarchy includes only index, class, contentDesc, and bounds. Please align Same-Target Identity with fields the planner actually receives to avoid ambiguous retry accounting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/src/prompts/planner.md` around lines 204 - 208, The
planner's "Same-Target Identity" rule references unavailable `resource id`;
update it to use only fields the planner receives: replace the `resource id`
bullet with identity via matching `contentDesc`, or a matching hierarchical node
path built from the `index` and `class` sequence (i.e., same index/class
ancestry), and keep the bounding-box-centers-within-20px rule; specifically edit
the "Same-Target Identity" section in planner.md to list: matching
`contentDesc`, OR identical hierarchical path using `index`+`class`, OR
bounding-box centers within 20 px.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 163-167: The verification text references an unsupported action
token `delete`, which can break planner normalization; update the sentence in
planner.md that lists `delete` (the bullet under "2. Partial change (low delta)"
with the `input_text`/`tap`/`delete` bullets) to use a supported action name
from the <actions> catalog (e.g., replace `delete` with the canonical token such
as `remove` or `remove_item`) or rephrase it as plain language ("is the item
gone?") so it does not inject an unknown token into the planner schema; ensure
the changed wording matches the existing <actions> tokens used by the planner.

---

Nitpick comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 204-208: The planner's "Same-Target Identity" rule references
unavailable `resource id`; update it to use only fields the planner receives:
replace the `resource id` bullet with identity via matching `contentDesc`, or a
matching hierarchical node path built from the `index` and `class` sequence
(i.e., same index/class ancestry), and keep the bounding-box-centers-within-20px
rule; specifically edit the "Same-Target Identity" section in planner.md to
list: matching `contentDesc`, OR identical hierarchical path using
`index`+`class`, OR bounding-box centers within 20 px.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2299bba9-3294-4b8a-9156-c363be660804

📥 Commits

Reviewing files that changed from the base of the PR and between 9c4d4c9 and 2f97089.

📒 Files selected for processing (1)

packages/goal-executor/src/prompts/planner.md

droid-ash requested a review from arnoldlaishram April 12, 2026 01:09

droid-ash self-assigned this Apr 12, 2026

droid-ash added the bug Something isn't working label Apr 12, 2026

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

droid-ash force-pushed the rewrite-planner-prompt branch from 9c4d4c9 to 2f97089 Compare April 12, 2026 01:35

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure planner prompt for clarity and stricter retry rules#84

Restructure planner prompt for clarity and stricter retry rules#84
droid-ash wants to merge 1 commit intomainfrom
rewrite-planner-prompt

droid-ash commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

droid-ash commented Apr 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

droid-ash commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading