Skip to content

feat(browser): human-in-the-loop UI feedback#49

Merged
stephane-segning merged 9 commits into
mainfrom
claude/pedantic-curran-4c514d
Jun 14, 2026
Merged

feat(browser): human-in-the-loop UI feedback#49
stephane-segning merged 9 commits into
mainfrom
claude/pedantic-curran-4c514d

Conversation

@stephane-segning

@stephane-segning stephane-segning commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

What

Adds a human-in-the-loop feedback channel to the browser bridge: the agent can ask the user to mark up the live page (click an element, box a region, drop a comment, confirm) and get back structured annotations resolved to element refs. On pages where the overlay can't be injected, it falls back to a docked side panel over a screenshot.

Design doc: plans/ui-feedback.md.

Locked decisions

  • In-page overlay primary; docked side panel fallback (Chromium chrome.sidePanel / Firefox sidebar_action) for overlay-blocked pages
  • One browser_request_feedback tool with a mode enum
  • New opt-in interactive tool group (off by default, like debug)
  • maxCommandMs ceiling = 10 min

Phases

  • Phase 0 — infra: CancelFrame (teardown on abort / per-command timeout / agent disconnect) + per-command timeout via CommandFrame.timeoutMs, clamped to maxCommandMs; guest backstop at requested + grace.
  • Phase 1 — minimal HITL: interactive group + browser_request_feedback (confirm/choose/point); branded in-page overlay; data-ocb-ref resolution; badge + tab-focus attention.
  • Phase 2 — rich annotation: element / region / comment modes.
  • Side-panel fallback: screenshot-capture + cross-browser docked panel when the overlay can't inject; toggles openPanelOnActionClick only while a request is pending (preserves the popup); shares the overlay's result/cancel path.

Deferred (deliberately — see plan doc)

  • Marker-annotated screenshot returnNeutralResult is single-kind (text|json|image); resolved refs are more actionable than burned-in pixels.

Notes

  • Side-panel point/region return screenshot-pixel coords (no live DOM on restricted pages → no refs); confirm/choose work fully there.
  • Neither browser lets an extension force-open the panel, so the user opens it from the badge-flagged toolbar icon.

Tests

Broker cancel/timeout/clamp, protocol cancel round-trip, agent-client timeoutMs forwarding + backstop, catalog gating + all mode summaries + error-vs-timeout. The side panel UI / chrome-API glue isn't unit-tested (DOM + extension APIs); the catalog-level contract is.

Pre-push gate green: pnpm -r build && typecheck && test && lint && format:check.

🤖 Generated with Claude Code

Plan for a browser_request_feedback tool that paints an annotation
overlay and blocks on the user, returning structured annotations
(resolved to element refs). Phased: 0) CancelFrame + per-command
timeout infra, 1) confirm/choose/point overlay, 2) region/element/
comment + screenshot + side-panel fallback.

Also index multi-client-routing.md in the plans list (was only
linked from the browser table row).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a design document (plans/ui-feedback.md) detailing a human-in-the-loop UI feedback mechanism for the browser automation plugin, as well as updating CLAUDE.md. The reviewer feedback raises important points regarding the design: expanding the command-cancellation frame to support non-interactive commands, addressing Manifest V3 service worker ephemerality with keep-alive and state persistence strategies, and utilizing the side-panel fallback as a secure alternative to in-page overlays to prevent spoofing on hostile pages.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread plans/ui-feedback.md
Comment on lines +69 to +70
keeps a small `Map<id, () => void>` of teardown handlers for in-flight interactive commands and runs
the matching one. Non-interactive commands ignore `cancel` (idempotent no-op). This is generally

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Limiting cancel to only interactive commands misses a valuable opportunity. Non-interactive commands like browser_wait (which can have long delays) or slow navigations/page actions could also benefit from cancellation.

Consider designing the CancelFrame handler in the executor to be general-purpose. Any command that returns a promise that can be aborted (e.g., using an AbortController or clearing a setTimeout for wait) should register its teardown/abort function in the Map<id, () => void>. This would allow the agent to cleanly abort hung or slow non-interactive commands as well.

Comment thread plans/ui-feedback.md
Comment on lines +143 to +144
- The open port + the bridge WebSocket keep the MV3 worker alive for the wait. Document this as the
one place the worker must not be allowed to idle out mid-command.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Manifest V3 (MV3), service workers are ephemeral and can be terminated by the browser even if there is an open WebSocket or chrome.runtime port, especially over longer durations (up to the proposed 10-minute maxCommandMs limit). Relying solely on these connections to keep the worker alive is fragile.

To ensure robustness, the design should explicitly address:

  1. Keep-alive mechanisms: Implementing a proactive keep-alive strategy (e.g., periodic pings/messages over the port/WebSocket, or using chrome.alarms to wake up or reset the idle timer).
  2. State persistence / Reconnection: How the extension recovers if the service worker is terminated and restarted while a human is still interacting with the overlay (e.g., persisting active command IDs and port states in chrome.storage.local so the resurrected worker can re-associate with the active overlay).

Comment thread plans/ui-feedback.md
Comment on lines +193 to +194
- The overlay must be **unmistakably branded** as opencode-browser and **un-spoofable** by the page
(so a malicious page can't fake the prompt to harvest a click), and always dismissible.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

While branding the in-page overlay is important, an in-page overlay injected into the page's DOM (even within a closed Shadow DOM) can never be fully "un-spoofable" or secure against a truly hostile page. A malicious page can still overlay its own elements on top of the extension's overlay, manipulate the z-index, intercept pointer events, or attempt clickjacking.

For high-security contexts or hostile pages, the design should emphasize that the side-panel fallback (which runs in the extension's privileged context and is completely isolated from the page's DOM/JS) is the only way to guarantee an un-spoofable and secure interaction. Consider recommending the side-panel fallback as the default or automatic choice when navigating pages with restrictive Content Security Policies (CSP) or suspected hostile behavior.

stephane-segning and others added 3 commits June 14, 2026 17:52
Infra for human-in-the-loop tools that block on a user. Two gaps a
long-running command exposes, fixed bridge-wide:

1. CancelFrame (broker→executor): on agent abort, per-command timeout,
   or the requesting agent disconnecting, the broker now tells the
   still-connected executor to tear down via a `cancel` frame before
   rejecting — so a blocking command (e.g. an open overlay) is never
   orphaned in the page. New `abandon()` helper; wired into the abort
   handler, timeout handler, and agent-disconnect path.

2. Per-command timeout: ToolSpec.timeoutMs threads through
   send() → CommandFrame.timeoutMs → broker, overriding the global
   default and clamped to maxCommandMs (default 10 min). Guest agents
   forward it on the wire and keep a local backstop timer at
   requested + grace so the broker (which can cancel the executor)
   always fires first.

Extension: bridge-client handles inbound `cancel` → CommandRouter.cancel(id),
which runs a teardown registered by cancellable commands (empty until
Phase 1). Protocol mirror updated.

Tests: broker cancel-on-abort / on-agent-disconnect / no-cancel-on-success,
per-command override + maxCommandMs clamp; protocol cancel round-trip;
agent-client timeoutMs forwarding + backstop grace.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the opt-in `interactive` tool group and `browser_request_feedback`:
the agent asks the human at the browser to respond on the page and blocks
until they do (or it times out).

Plugin/catalog:
- new ToolGroup "interactive" (opt-in, like debug); request_feedback action.
- browser_request_feedback spec with mode confirm|choose|point, prompt,
  options, user timeoutMs (default 120s, clamped 290s). timeoutMs: 300s so
  the broker deadline backstops the overlay's own countdown.
- Annotation / FeedbackResult neutral types; result summarizes the response
  (confirm → "confirmed", choose → chosen value, point → resolved element ref).

Extension:
- feedback-overlay.ts: self-contained injected overlay (branded, dismissible,
  Esc-to-skip). point mode resolves the clicked element to its data-ocb-ref
  (+ CSS selector + coords). Teardown is a second tiny injection.
- feedback.ts: background orchestrator — owns the timeout, correlates the
  page→bg result message, raises the toolbar badge + focuses the tab, and
  exposes cancel() for the broker's cancel frame (wired via CommandRouter).
- command-router request_feedback case registers the canceller before awaiting.

Docs: interactive group + tool reference in browser.md; group/count updates in
CLAUDE.md. Tests: gating, action+timeout forwarding, user-timeout clamp, result
summaries (confirm/timeout/point-ref).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed fallback

Extends browser_request_feedback beyond confirm/choose/point:

- element — hover highlights the element under the cursor; click picks it →
  { kind:"element", ref, selector, text }.
- region — drag a box → { kind:"region", rect, refs } where refs are every
  data-ocb-ref element inside the box.
- comment — click a spot, then type a free-text note attached to the point.

Overlay (feedback-overlay.ts): adds hover-highlight, rubber-band region select,
and a note panel; shared ref/selector resolution against data-ocb-ref so every
spatial mark stays actionable as a ref.

Overlay-blocked pages (restricted / CSP) now return { responded:false, error }
distinct from a timeout, and the tool result tells the agent to fall back to a
screenshot/snapshot instead of re-asking.

Deferred, deliberately (documented in plans/ui-feedback.md): marker-annotated
screenshot return (NeutralResult can't carry json+image; refs > pixels) and the
side-panel-over-screenshot fallback (large separate UI; the error path covers
the hostile-page case for now).

Docs: new modes + fallback in browser.md; plan doc marked Phases 0–2 done with
the deferred-decisions rationale. Tests: element/region/comment summaries +
error-vs-timeout shaping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@stephane-segning stephane-segning marked this pull request as ready for review June 14, 2026 16:08

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c21c98e1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +708 to +710
if (!d || d.responded === false || d.timedOut) {
return json(
{ responded: false, timedOut: true, annotations: [] } satisfies FeedbackResult,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve explicit feedback dismissals

When the user actively skips or presses Esc, the overlay returns responded: false without timedOut; this branch treats that the same as a timeout and rewrites the metadata to { timedOut: true }, so the agent is told “No response” even though the human explicitly dismissed the prompt. This loses an important distinction for choose/point/region/comment prompts and can cause the agent to retry or fall back as if the user were absent.

Useful? React with 👍 / 👎.

stephane-segning and others added 5 commits June 14, 2026 18:39
When browser_request_feedback can't inject its in-page overlay (chrome://,
Web Store, CSP-locked origins), the extension now captures a screenshot and
routes the request to its own side panel instead of returning an error.

- feedback-side-panel.ts: captures the tab via chrome.tabs.captureVisibleTab,
  stores the pending session, and enables the panel cross-browser (Chromium
  chrome.sidePanel / Firefox sidebar_action). Neither browser allows force-open,
  so it raises the badge and — only while a request is pending — flips
  openPanelOnActionClick so the toolbar click opens the panel, restoring the
  normal popup behavior on resolve.
- new sidepanel entrypoint (React + daisyUI): renders the screenshot with the
  mode's controls; point/region map clicks to screenshot-pixel coords (no live
  DOM → no refs on these pages); confirm/choose work fully. Posts the same
  ocb-feedback-result message as the overlay, so the background's correlation /
  timeout / cancel path is unchanged.
- feedback.ts overlay-failure path now tries the side panel first; only if a
  screenshot can't be taken does it settle with { error }.
- background routes feedback:get-pending; wxt adds the sidePanel permission
  (Chromium) — WXT maps the entrypoint to side_panel / sidebar_action per target.

Docs: side-panel fallback documented in browser.md; plan doc marks it shipped
(fallback-only, docked); CLAUDE.md entrypoint/layout note updated. Marker-
annotated screenshot remains deliberately deferred (NeutralResult is single-kind;
refs > pixels).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
createBunTransport was removed in the move to the ws-based transport (#46),
leaving scripts/bridge-harness.mjs importing a missing export. Switch it to
createNodeTransport + createNodeAgentSocket (both run under Bun), thread an
optional timeoutMs through /cmd so a human-paced request_feedback prompt waits,
and document the feedback + side-panel-fallback commands.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cover the browser-API-free logic surfaced by the coverage audit:

opencode-browser (61% → 76% statements):
- schema.test: toJsonSchema across all field types, optional/required, enums,
  nested objects, recursive array items.
- logging.test: level filtering, console-channel routing, secret redaction,
  fromOpenCodeLogLevel mapping.
- catalog.test: every tool's params()/result() exercised (catalog 50% → 97%) —
  valid NeutralResult + no throw on empty executor data; action/group integrity.

opencode-browser-mcp: selectTools gates the interactive feedback tool.

browser-extension:
- extract the side-panel coordinate math into a pure lib/annotate.ts
  (toNatural / rectFromCorners / naturalRect) and refactor ScreenshotAnnotator
  to use it; annotate.test covers the mapping.
- utils.test: cn class-merge + timeAgo buckets.

Per the repo convention, chrome.*/DOM/React glue stays manually verified — that's
the jsdom-harness work tracked as the post-merge follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Push the unit-testable surface hard using the existing dependency-injection
seams (no browser/DOM needed) — the bulk of the codebase, not just leaf helpers:

opencode-browser (62% → 91% statements):
- agent-client (50→84%): connect/ready handshake, result correlation, error-code
  propagation, unknown-id, release/stop/onClose, abort while queued, not-connected.
- broker (79→91%): ping→pong, group-event routing vs broadcast, cookies global
  action, ambiguous/unknown target, executor reconnect-replace, aggregate tabs,
  stop() rejects pending.
- opencode (0→84%): createBrowserPlugin via injected fake transport + client —
  default/opt-in group registration, disabled path, token advertise (generated
  vs explicit), config hook. token-file vi.mock'd so it never touches the real
  shared bridge.json.
- token-file (56→91%): resolveSharedToken explicit/file/generated/port-mismatch.
- protocol (89→96%): hello/result/error frame builders, pong, malformed frames.

opencode-browser-mcp (server + render now 100%): createMcpServer driven over an
in-memory transport with a real MCP Client — group-filtered listing, tool-call
routing through send(), feedback per-command timeout, send-rejection → isError,
unknown tool. (The remaining package % is the stdio bin entry, not unit-testable.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add @vitest/coverage-v8, a `coverage` script per package + root `pnpm coverage`
(pnpm -r --if-present coverage), and a v8 coverage block with thresholds in each
vitest.config.ts. CI's Test step becomes "Test + coverage" (runs tests AND fails
below threshold); the pre-push gate uses `pnpm coverage`.

Thresholds sit a few points below current so a regression fails without
exact-match churn:
- opencode-browser  88/74/82/88  (S/B/F/L)
- opencode-browser-mcp  95/65/95/95 — excludes the stdio bin (mcp.ts) + index
  re-export, so the bar applies to the real logic (server + render).
- opencode-oauth2  73/64/70/73   models-info  70/68/60/70   ratelimit  68/72/55/68
- browser-extension floor is intentionally low (3/3/2/3) — chrome/DOM/React glue
  is verified manually; raised once the fake-browser harness lands (phase 2).

Docs: CLAUDE.md commands + pre-push gate updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@stephane-segning stephane-segning merged commit 5bd2bc1 into main Jun 14, 2026
3 checks passed
stephane-segning added a commit that referenced this pull request Jun 14, 2026
…(phase 2) (#50)

* test(browser-extension): fake-chrome harness + background unit tests (4% → 14%)

Phase 2 (post-merge follow-up to #49): a dependency-free fake of the chrome.*
surface the background uses, wired as a vitest setupFile, so the extension's
background logic is unit-testable under the node environment.

- test/helpers/fake-chrome.ts: stable singleton fake (runtime message bus,
  action/tabs/windows/scripting/sidePanel/cookies) with inspectable state +
  configurable failure modes; identity never changes so module-level captures
  (const sidePanelApi = chrome.sidePanel) stay valid across resets.
- feedback.test: startFeedback orchestration — result correlation, id filtering,
  attention badge/clear, cancel, overlay→side-panel fallback, capture-fail error,
  and timeout (fake timers).
- feedback-side-panel.test: session store — capture+enable, screenshot-fail,
  clear restores popup behavior + retires the panel, non-matching-id no-op.
- command-router.test: registry/executor-backed actions, page-action bridges
  (snapshot/get_text/get_html via the fake scripting), cookies, unknown-action,
  and the cancel registry (db mocked). request_feedback drives end to end.

Extension coverage floor raised 3% → 13%. Remaining (still climbing): bridge-client,
group-registry, page-actions (jsdom), and the React panels.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(browser-extension): bridge-client + group-registry unit tests (14% → 25%)

Continue phase 2 on the fake-chrome harness (no new deps):

- fake-chrome: enrich tabs (create/remove/goBack/goForward/reload; get now
  returns a `complete` tab so group-registry's waitForComplete doesn't spin).
- bridge-client.test: fake WebSocket + mocked setStatus — hello handshake with
  role/identity, ready→connected, command→correlated result, handler-throw→error
  result, cancel→onCancel, ping→pong, disconnect, and the no-token guard.
- group-registry.test: mocked Dexie table + fake chrome.tabs — resolveTab
  isolation guard, open/navigate/list/close/forget lifecycle, foreign-tab
  rejection, and hydrate from the db.

Extension coverage floor raised 13% → 23%. Remaining: page-actions (jsdom) and
the React panels (jsdom + testing-library).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant