Restructure planner prompt for clarity and stricter retry rules#84
Restructure planner prompt for clarity and stricter retry rules#84
Conversation
📝 WalkthroughWalkthroughThe planner prompt was restructured from principle-based sections to a schema-driven turn protocol with explicit output schemas, sequential turn loops, screenshot-delta stagnation detection, and a bounded retry decision tree; Phase 3 is now terminal observation-only with immediate status emission. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/goal-executor/src/prompts/planner.md (1)
191-193: Reword repeated sentence openings for readability.Lines 191–193 start three consecutive bullets with “Do not”. This is minor, but a small rephrase improves scanability of the prohibition list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/goal-executor/src/prompts/planner.md` around lines 191 - 193, Three consecutive bullets starting with "Do **not**" (the lines beginning "Do **not** substitute `long_press` for `tap`...", "Do **not** open an overflow menu...", and "Do **not** re-describe the target...") should be reworded for readability so they don't all start the same way; update those three bullet items to use varied openings like "Avoid substituting `long_press` for `tap`...", "Do not open an overflow menu..." (or "Avoid opening an overflow menu..."), and "Avoid re-describing the target..." while preserving the existing prohibitions and examples so the semantics and references to `{testCase}`, overflow/menu behavior, and Same-Target Identity remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 274-280: The documentation conflicts between the global "Budget: 2
dismissal actions per blocker type across the entire test" rule and the local
"Handle per Stagnation Case B: 1 dismissal + 1 retry, then FAIL" rule; update
planner.md to resolve precedence: state that Stagnation Case B is a stricter,
local policy for obstructions occurring during a specific action and therefore
still counts against the global per-test dismissal budget (or explicitly mark
Case B as an exception), and add a sentence clarifying how recurring
popup-on-target scenarios increment the global counter for the blocker type and
how the two-rule interaction leads to final FAIL (e.g., Case B triggers
immediate FAIL if its local retries are exhausted even if the global budget
remains, or vice versa — choose and document one consistent precedence).
- Around line 163-167: The verification text incorrectly references an
unsupported action name "delete" (seen alongside "input_text" and "tap") which
is not defined in the <actions> schema; either remove/rename the "delete"
mention to one of the supported action tokens (e.g., use the established verb
for item removal in your schema such as "remove" or "item_removed") or
explicitly add "delete" to the <actions> definitions so the planner stays
on-schema—update the sentence at the list entry (the line that currently starts
with "* delete →") to use the correct action token and matching wording for
verification.
---
Nitpick comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 191-193: Three consecutive bullets starting with "Do **not**" (the
lines beginning "Do **not** substitute `long_press` for `tap`...", "Do **not**
open an overflow menu...", and "Do **not** re-describe the target...") should be
reworded for readability so they don't all start the same way; update those
three bullet items to use varied openings like "Avoid substituting `long_press`
for `tap`...", "Do not open an overflow menu..." (or "Avoid opening an overflow
menu..."), and "Avoid re-describing the target..." while preserving the existing
prohibitions and examples so the semantics and references to `{testCase}`,
overflow/menu behavior, and Same-Target Identity remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bd21fdbb-9e0f-430c-8fe7-976374e6c815
📒 Files selected for processing (1)
packages/goal-executor/src/prompts/planner.md
| **2. Partial change (low delta).** Same layout, but a specific element changed — text appeared in a field, a toggle flipped, an item disappeared, a keyboard opened. → Verify the specific target: | ||
| * `input_text` → does the field contain exactly what you typed (or masked dots for passwords)? If empty or wrong, correct it before proceeding. | ||
| * `tap` → did the expected state change occur (button pressed, checkbox toggled, item selected)? | ||
| * delete → is the item gone? | ||
| * business logic → did the rule apply correctly (not just "did something change")? If you changed the language, did the menu text actually change to the new language? |
There was a problem hiding this comment.
Unsupported delete action is referenced in verification logic.
Line 166 introduces delete as if it were an action type, but <actions> does not define it. This can push the planner off-schema.
Suggested wording fix
-* delete → is the item gone?
+* delete-step outcome → if the prior step intended deletion (typically via `tap`/`long_press` + confirm), verify the item is gone.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/goal-executor/src/prompts/planner.md` around lines 163 - 167, The
verification text incorrectly references an unsupported action name "delete"
(seen alongside "input_text" and "tap") which is not defined in the <actions>
schema; either remove/rename the "delete" mention to one of the supported action
tokens (e.g., use the established verb for item removal in your schema such as
"remove" or "item_removed") or explicitly add "delete" to the <actions>
definitions so the planner stays on-schema—update the sentence at the list entry
(the line that currently starts with "* delete →") to use the correct action
token and matching wording for verification.
Reorganizes the planner prompt into clearly scoped sections (turn loop, output schema, verification logic, stagnation decision tree, phases, protocols) and tightens retry/stagnation accounting with bounded budgets per failure case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9c4d4c9 to
2f97089
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/goal-executor/src/prompts/planner.md (1)
163-167:⚠️ Potential issue | 🟠 MajorUnsupported
deletetoken is still referenced in verification logic.
deleteis not in the<actions>catalog, so this wording can push the planner off-schema and fail normalization/parsing.Suggested wording fix
-* delete → is the item gone? +* delete-step outcome → if the prior step intended deletion (typically via `tap`/`long_press` + confirm), verify the item is gone.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/goal-executor/src/prompts/planner.md` around lines 163 - 167, The verification text references an unsupported action token `delete`, which can break planner normalization; update the sentence in planner.md that lists `delete` (the bullet under "2. Partial change (low delta)" with the `input_text`/`tap`/`delete` bullets) to use a supported action name from the <actions> catalog (e.g., replace `delete` with the canonical token such as `remove` or `remove_item`) or rephrase it as plain language ("is the item gone?") so it does not inject an unknown token into the planner schema; ensure the changed wording matches the existing <actions> tokens used by the planner.
🧹 Nitpick comments (1)
packages/goal-executor/src/prompts/planner.md (1)
204-208: Identity rule references unavailableresource idmetadata.
<inputs>says hierarchy includes onlyindex,class,contentDesc, andbounds. Please align Same-Target Identity with fields the planner actually receives to avoid ambiguous retry accounting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/goal-executor/src/prompts/planner.md` around lines 204 - 208, The planner's "Same-Target Identity" rule references unavailable `resource id`; update it to use only fields the planner receives: replace the `resource id` bullet with identity via matching `contentDesc`, or a matching hierarchical node path built from the `index` and `class` sequence (i.e., same index/class ancestry), and keep the bounding-box-centers-within-20px rule; specifically edit the "Same-Target Identity" section in planner.md to list: matching `contentDesc`, OR identical hierarchical path using `index`+`class`, OR bounding-box centers within 20 px.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 163-167: The verification text references an unsupported action
token `delete`, which can break planner normalization; update the sentence in
planner.md that lists `delete` (the bullet under "2. Partial change (low delta)"
with the `input_text`/`tap`/`delete` bullets) to use a supported action name
from the <actions> catalog (e.g., replace `delete` with the canonical token such
as `remove` or `remove_item`) or rephrase it as plain language ("is the item
gone?") so it does not inject an unknown token into the planner schema; ensure
the changed wording matches the existing <actions> tokens used by the planner.
---
Nitpick comments:
In `@packages/goal-executor/src/prompts/planner.md`:
- Around line 204-208: The planner's "Same-Target Identity" rule references
unavailable `resource id`; update it to use only fields the planner receives:
replace the `resource id` bullet with identity via matching `contentDesc`, or a
matching hierarchical node path built from the `index` and `class` sequence
(i.e., same index/class ancestry), and keep the bounding-box-centers-within-20px
rule; specifically edit the "Same-Target Identity" section in planner.md to
list: matching `contentDesc`, OR identical hierarchical path using
`index`+`class`, OR bounding-box centers within 20 px.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2299bba9-3294-4b8a-9156-c363be660804
📒 Files selected for processing (1)
packages/goal-executor/src/prompts/planner.md
Summary
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Improvements
Documentation