Skip to content

fix(core): detect repeated actions across changing element refs (#430)#477

Open
lmorchard wants to merge 1 commit into
mainfrom
fix/repetition-signature-ref-churn
Open

fix(core): detect repeated actions across changing element refs (#430)#477
lmorchard wants to merge 1 commit into
mainfrom
fix/repetition-signature-ref-churn

Conversation

@lmorchard
Copy link
Copy Markdown
Collaborator

Summary

  • The repetition detector hashed action signatures as action:ref:value, but refs are regenerated on every ARIA snapshot — a logical "click Submit" got a different ref each turn, so warn/abort thresholds never tripped on the bug they were designed to catch.
  • Drops ref from the signature (action:normalized-value) and exempts scroll and wait from detection entirely. Their same-value repetition is legitimate workflow (traversing an infinite feed, polling a slow page), not a stuck-loop signal.
  • Normalizes value (lowercase + trim) so cosmetic input variation doesn't reset the counter on the same logical action.

Closes #430 partially — the four other items in that issue (config rename, stagnant-page detector, decoupling warning from forced snapshot, element-identity threading into ActionResult) are deferred to follow-up PRs. Bundling them would turn a tight ~10-line production change into a multi-thread review with little upside.

Why not stable refs (PR #447) instead?

Stable refs would in theory let the existing action:ref:value detector work as-named. But PR #447 (closed 2026-05-14, not merged) ran that experiment: A/B eval was 47/59 → 45/59 success rate across two rounds — consistently in the wrong direction. The signature change here doesn't depend on ref-stability strategy and avoids that regression risk.

Design decisions

  • Reset state on exempt actions. A scroll between two clicks resets the counter to 0 on the second click. Treats the scroll as progress (it likely revealed new state). The alternative — making exempt actions "invisible" so click-scroll-click-scroll-click counts as 3 repeated clicks — is more aggressive but punishes a common legitimate pattern.
  • Both scroll and wait in the exempt set from day 1. Wait is conceptually identical to scroll for our purposes: a slow page legitimately gets several identical wait(2s) calls in a row.
  • Lowercase + trim normalization. Catches fill("hello") vs fill("HELLO ") as the same logical action. Lowercasing folds "Yes"/"yes"/"YES" together — if the agent is deliberately A/B-testing case variations we'd over-flag, but that's a rare pattern and the cost is one retry on the warning before abort.

Test plan

New tests in packages/core/test/webAgent.test.ts (in the enhanced error handling block):

  • should detect repeated clicks across changing element refs — the regression test for the bug. 5 clicks with ref in [E1, E5, E12, E3, E8] — pre-fix the detector treated each as a distinct action; post-fix the abort fires at count 4.
  • should not flag legitimate repeated scroll actions — 5 scroll: down calls then done; no warning user-message injected.
  • should not flag legitimate repeated wait actions — same shape with wait(2s).
  • should treat cosmetically different fill values as repetition["hello", "HELLO ", " Hello", "hello", "HELLO"] trips the detector via value normalization.

The existing should detect and handle repeated actions test still passes unchanged — its mock uses the same ref: "btn1" every turn, which under the new signature still collides as click:.

Full validation: pnpm run check passes (1266 tests across core/cli/server/extension, format + typecheck clean).

🤖 Generated with Claude Code

This comment was marked as resolved.

lmorchard added a commit that referenced this pull request May 26, 2026
Pass 1 dropped ref from the signature to fix cross-snapshot loop
detection, but introduced a new false-positive class: ref-only actions
(click, hover, check, etc.) all carry no value, so the bare `action:`
signature collapsed every call together. A multi-step wizard with three
"Next" clicks in a row would trip the detector falsely.

Capture element identity (role + accessible name) at snapshot-build time
into a globalThis.__piloIdentityMap alongside the existing __piloRefMap.
performActionWithValidation looks it up via the new getRefIdentity()
browser method (implemented in PlaywrightBrowser and ExtensionBrowser)
BEFORE the action runs, while the snapshot's identity for the ref is
still valid. The result lands on ActionResult as an optional
targetIdentity, which the signature folds in when present.

Now click:button:Next:  and click:button:Submit:  hash distinctly, while
click:button:Submit:  remains constant across ref churn — both detection
directions correct.

Addresses Copilot review feedback on PR #477.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The repetition detector hashed action signatures as action:ref:value,
but refs are regenerated on every ARIA snapshot. A logical "click
Submit" got a different ref each turn, so the detector treated each
attempt as a distinct action and the warn/abort thresholds never tripped
on the bug they were designed to catch.

Drop ref from the signature, normalize value (lowercase + trim), and
exempt scroll and wait from detection entirely — their same-value
repetition is legitimate workflow (traversing an infinite feed, polling
a slow page), not a stuck-loop signal.

To keep ref-only actions (click, hover, check, etc.) from collapsing
every call onto the bare `action:` signature, also thread element
identity into the detector. ariaSnapshot.ts populates a new
globalThis.__piloIdentityMap alongside __piloRefMap, and a new
AriaBrowser.getRefIdentity() method (implemented in PlaywrightBrowser
and ExtensionBrowser) looks it up. performActionWithValidation captures
identity BEFORE the action runs (while the ref is still valid) and
attaches it to ActionResult as targetIdentity, which the signature
folds in when present. Now click:button:Next: and click:button:Submit:
hash distinctly, while click:button:Submit: remains constant across
ref churn.

Defers the larger pieces from #430 (config rename, stagnant-page
detector, decoupling warn from forced snapshot) to follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmorchard lmorchard force-pushed the fix/repetition-signature-ref-churn branch from 40db079 to 58410db Compare May 26, 2026 22:16
@lmorchard
Copy link
Copy Markdown
Collaborator Author

Giving this an eval smoke test, then opening for review if nothing disastrous happens.

@lmorchard lmorchard marked this pull request as ready for review May 27, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix repetition detection signature to survive ref churn between snapshots

2 participants