fix(core): detect repeated actions across changing element refs (#430)#477
Open
lmorchard wants to merge 1 commit into
Open
fix(core): detect repeated actions across changing element refs (#430)#477lmorchard wants to merge 1 commit into
lmorchard wants to merge 1 commit into
Conversation
lmorchard
added a commit
that referenced
this pull request
May 26, 2026
Pass 1 dropped ref from the signature to fix cross-snapshot loop detection, but introduced a new false-positive class: ref-only actions (click, hover, check, etc.) all carry no value, so the bare `action:` signature collapsed every call together. A multi-step wizard with three "Next" clicks in a row would trip the detector falsely. Capture element identity (role + accessible name) at snapshot-build time into a globalThis.__piloIdentityMap alongside the existing __piloRefMap. performActionWithValidation looks it up via the new getRefIdentity() browser method (implemented in PlaywrightBrowser and ExtensionBrowser) BEFORE the action runs, while the snapshot's identity for the ref is still valid. The result lands on ActionResult as an optional targetIdentity, which the signature folds in when present. Now click:button:Next: and click:button:Submit: hash distinctly, while click:button:Submit: remains constant across ref churn — both detection directions correct. Addresses Copilot review feedback on PR #477. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The repetition detector hashed action signatures as action:ref:value, but refs are regenerated on every ARIA snapshot. A logical "click Submit" got a different ref each turn, so the detector treated each attempt as a distinct action and the warn/abort thresholds never tripped on the bug they were designed to catch. Drop ref from the signature, normalize value (lowercase + trim), and exempt scroll and wait from detection entirely — their same-value repetition is legitimate workflow (traversing an infinite feed, polling a slow page), not a stuck-loop signal. To keep ref-only actions (click, hover, check, etc.) from collapsing every call onto the bare `action:` signature, also thread element identity into the detector. ariaSnapshot.ts populates a new globalThis.__piloIdentityMap alongside __piloRefMap, and a new AriaBrowser.getRefIdentity() method (implemented in PlaywrightBrowser and ExtensionBrowser) looks it up. performActionWithValidation captures identity BEFORE the action runs (while the ref is still valid) and attaches it to ActionResult as targetIdentity, which the signature folds in when present. Now click:button:Next: and click:button:Submit: hash distinctly, while click:button:Submit: remains constant across ref churn. Defers the larger pieces from #430 (config rename, stagnant-page detector, decoupling warn from forced snapshot) to follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
40db079 to
58410db
Compare
Collaborator
Author
|
Giving this an eval smoke test, then opening for review if nothing disastrous happens. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
action:ref:value, but refs are regenerated on every ARIA snapshot — a logical "click Submit" got a different ref each turn, so warn/abort thresholds never tripped on the bug they were designed to catch.reffrom the signature (action:normalized-value) and exemptsscrollandwaitfrom detection entirely. Their same-value repetition is legitimate workflow (traversing an infinite feed, polling a slow page), not a stuck-loop signal.value(lowercase + trim) so cosmetic input variation doesn't reset the counter on the same logical action.Closes #430 partially — the four other items in that issue (config rename, stagnant-page detector, decoupling warning from forced snapshot, element-identity threading into
ActionResult) are deferred to follow-up PRs. Bundling them would turn a tight ~10-line production change into a multi-thread review with little upside.Why not stable refs (PR #447) instead?
Stable refs would in theory let the existing
action:ref:valuedetector work as-named. But PR #447 (closed 2026-05-14, not merged) ran that experiment: A/B eval was 47/59 → 45/59 success rate across two rounds — consistently in the wrong direction. The signature change here doesn't depend on ref-stability strategy and avoids that regression risk.Design decisions
click-scroll-click-scroll-clickcounts as 3 repeated clicks — is more aggressive but punishes a common legitimate pattern.scrollandwaitin the exempt set from day 1. Wait is conceptually identical to scroll for our purposes: a slow page legitimately gets several identicalwait(2s)calls in a row.fill("hello")vsfill("HELLO ")as the same logical action. Lowercasing folds "Yes"/"yes"/"YES" together — if the agent is deliberately A/B-testing case variations we'd over-flag, but that's a rare pattern and the cost is one retry on the warning before abort.Test plan
New tests in
packages/core/test/webAgent.test.ts(in theenhanced error handlingblock):should detect repeated clicks across changing element refs— the regression test for the bug. 5 clicks withrefin[E1, E5, E12, E3, E8]— pre-fix the detector treated each as a distinct action; post-fix the abort fires at count 4.should not flag legitimate repeated scroll actions— 5scroll: downcalls thendone; no warning user-message injected.should not flag legitimate repeated wait actions— same shape withwait(2s).should treat cosmetically different fill values as repetition—["hello", "HELLO ", " Hello", "hello", "HELLO"]trips the detector via value normalization.The existing
should detect and handle repeated actionstest still passes unchanged — its mock uses the sameref: "btn1"every turn, which under the new signature still collides asclick:.Full validation:
pnpm run checkpasses (1266 tests across core/cli/server/extension, format + typecheck clean).🤖 Generated with Claude Code