feat(browser): human-in-the-loop UI feedback#49
Conversation
Plan for a browser_request_feedback tool that paints an annotation overlay and blocks on the user, returning structured annotations (resolved to element refs). Phased: 0) CancelFrame + per-command timeout infra, 1) confirm/choose/point overlay, 2) region/element/ comment + screenshot + side-panel fallback. Also index multi-client-routing.md in the plans list (was only linked from the browser table row). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request adds a design document (plans/ui-feedback.md) detailing a human-in-the-loop UI feedback mechanism for the browser automation plugin, as well as updating CLAUDE.md. The reviewer feedback raises important points regarding the design: expanding the command-cancellation frame to support non-interactive commands, addressing Manifest V3 service worker ephemerality with keep-alive and state persistence strategies, and utilizing the side-panel fallback as a secure alternative to in-page overlays to prevent spoofing on hostile pages.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| keeps a small `Map<id, () => void>` of teardown handlers for in-flight interactive commands and runs | ||
| the matching one. Non-interactive commands ignore `cancel` (idempotent no-op). This is generally |
There was a problem hiding this comment.
Limiting cancel to only interactive commands misses a valuable opportunity. Non-interactive commands like browser_wait (which can have long delays) or slow navigations/page actions could also benefit from cancellation.
Consider designing the CancelFrame handler in the executor to be general-purpose. Any command that returns a promise that can be aborted (e.g., using an AbortController or clearing a setTimeout for wait) should register its teardown/abort function in the Map<id, () => void>. This would allow the agent to cleanly abort hung or slow non-interactive commands as well.
| - The open port + the bridge WebSocket keep the MV3 worker alive for the wait. Document this as the | ||
| one place the worker must not be allowed to idle out mid-command. |
There was a problem hiding this comment.
In Manifest V3 (MV3), service workers are ephemeral and can be terminated by the browser even if there is an open WebSocket or chrome.runtime port, especially over longer durations (up to the proposed 10-minute maxCommandMs limit). Relying solely on these connections to keep the worker alive is fragile.
To ensure robustness, the design should explicitly address:
- Keep-alive mechanisms: Implementing a proactive keep-alive strategy (e.g., periodic pings/messages over the port/WebSocket, or using
chrome.alarmsto wake up or reset the idle timer). - State persistence / Reconnection: How the extension recovers if the service worker is terminated and restarted while a human is still interacting with the overlay (e.g., persisting active command IDs and port states in
chrome.storage.localso the resurrected worker can re-associate with the active overlay).
| - The overlay must be **unmistakably branded** as opencode-browser and **un-spoofable** by the page | ||
| (so a malicious page can't fake the prompt to harvest a click), and always dismissible. |
There was a problem hiding this comment.
While branding the in-page overlay is important, an in-page overlay injected into the page's DOM (even within a closed Shadow DOM) can never be fully "un-spoofable" or secure against a truly hostile page. A malicious page can still overlay its own elements on top of the extension's overlay, manipulate the z-index, intercept pointer events, or attempt clickjacking.
For high-security contexts or hostile pages, the design should emphasize that the side-panel fallback (which runs in the extension's privileged context and is completely isolated from the page's DOM/JS) is the only way to guarantee an un-spoofable and secure interaction. Consider recommending the side-panel fallback as the default or automatic choice when navigating pages with restrictive Content Security Policies (CSP) or suspected hostile behavior.
Infra for human-in-the-loop tools that block on a user. Two gaps a long-running command exposes, fixed bridge-wide: 1. CancelFrame (broker→executor): on agent abort, per-command timeout, or the requesting agent disconnecting, the broker now tells the still-connected executor to tear down via a `cancel` frame before rejecting — so a blocking command (e.g. an open overlay) is never orphaned in the page. New `abandon()` helper; wired into the abort handler, timeout handler, and agent-disconnect path. 2. Per-command timeout: ToolSpec.timeoutMs threads through send() → CommandFrame.timeoutMs → broker, overriding the global default and clamped to maxCommandMs (default 10 min). Guest agents forward it on the wire and keep a local backstop timer at requested + grace so the broker (which can cancel the executor) always fires first. Extension: bridge-client handles inbound `cancel` → CommandRouter.cancel(id), which runs a teardown registered by cancellable commands (empty until Phase 1). Protocol mirror updated. Tests: broker cancel-on-abort / on-agent-disconnect / no-cancel-on-success, per-command override + maxCommandMs clamp; protocol cancel round-trip; agent-client timeoutMs forwarding + backstop grace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the opt-in `interactive` tool group and `browser_request_feedback`: the agent asks the human at the browser to respond on the page and blocks until they do (or it times out). Plugin/catalog: - new ToolGroup "interactive" (opt-in, like debug); request_feedback action. - browser_request_feedback spec with mode confirm|choose|point, prompt, options, user timeoutMs (default 120s, clamped 290s). timeoutMs: 300s so the broker deadline backstops the overlay's own countdown. - Annotation / FeedbackResult neutral types; result summarizes the response (confirm → "confirmed", choose → chosen value, point → resolved element ref). Extension: - feedback-overlay.ts: self-contained injected overlay (branded, dismissible, Esc-to-skip). point mode resolves the clicked element to its data-ocb-ref (+ CSS selector + coords). Teardown is a second tiny injection. - feedback.ts: background orchestrator — owns the timeout, correlates the page→bg result message, raises the toolbar badge + focuses the tab, and exposes cancel() for the broker's cancel frame (wired via CommandRouter). - command-router request_feedback case registers the canceller before awaiting. Docs: interactive group + tool reference in browser.md; group/count updates in CLAUDE.md. Tests: gating, action+timeout forwarding, user-timeout clamp, result summaries (confirm/timeout/point-ref). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed fallback
Extends browser_request_feedback beyond confirm/choose/point:
- element — hover highlights the element under the cursor; click picks it →
{ kind:"element", ref, selector, text }.
- region — drag a box → { kind:"region", rect, refs } where refs are every
data-ocb-ref element inside the box.
- comment — click a spot, then type a free-text note attached to the point.
Overlay (feedback-overlay.ts): adds hover-highlight, rubber-band region select,
and a note panel; shared ref/selector resolution against data-ocb-ref so every
spatial mark stays actionable as a ref.
Overlay-blocked pages (restricted / CSP) now return { responded:false, error }
distinct from a timeout, and the tool result tells the agent to fall back to a
screenshot/snapshot instead of re-asking.
Deferred, deliberately (documented in plans/ui-feedback.md): marker-annotated
screenshot return (NeutralResult can't carry json+image; refs > pixels) and the
side-panel-over-screenshot fallback (large separate UI; the error path covers
the hostile-page case for now).
Docs: new modes + fallback in browser.md; plan doc marked Phases 0–2 done with
the deferred-decisions rationale. Tests: element/region/comment summaries +
error-vs-timeout shaping.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c21c98e1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!d || d.responded === false || d.timedOut) { | ||
| return json( | ||
| { responded: false, timedOut: true, annotations: [] } satisfies FeedbackResult, |
There was a problem hiding this comment.
Preserve explicit feedback dismissals
When the user actively skips or presses Esc, the overlay returns responded: false without timedOut; this branch treats that the same as a timeout and rewrites the metadata to { timedOut: true }, so the agent is told “No response” even though the human explicitly dismissed the prompt. This loses an important distinction for choose/point/region/comment prompts and can cause the agent to retry or fall back as if the user were absent.
Useful? React with 👍 / 👎.
When browser_request_feedback can't inject its in-page overlay (chrome://,
Web Store, CSP-locked origins), the extension now captures a screenshot and
routes the request to its own side panel instead of returning an error.
- feedback-side-panel.ts: captures the tab via chrome.tabs.captureVisibleTab,
stores the pending session, and enables the panel cross-browser (Chromium
chrome.sidePanel / Firefox sidebar_action). Neither browser allows force-open,
so it raises the badge and — only while a request is pending — flips
openPanelOnActionClick so the toolbar click opens the panel, restoring the
normal popup behavior on resolve.
- new sidepanel entrypoint (React + daisyUI): renders the screenshot with the
mode's controls; point/region map clicks to screenshot-pixel coords (no live
DOM → no refs on these pages); confirm/choose work fully. Posts the same
ocb-feedback-result message as the overlay, so the background's correlation /
timeout / cancel path is unchanged.
- feedback.ts overlay-failure path now tries the side panel first; only if a
screenshot can't be taken does it settle with { error }.
- background routes feedback:get-pending; wxt adds the sidePanel permission
(Chromium) — WXT maps the entrypoint to side_panel / sidebar_action per target.
Docs: side-panel fallback documented in browser.md; plan doc marks it shipped
(fallback-only, docked); CLAUDE.md entrypoint/layout note updated. Marker-
annotated screenshot remains deliberately deferred (NeutralResult is single-kind;
refs > pixels).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
createBunTransport was removed in the move to the ws-based transport (#46), leaving scripts/bridge-harness.mjs importing a missing export. Switch it to createNodeTransport + createNodeAgentSocket (both run under Bun), thread an optional timeoutMs through /cmd so a human-paced request_feedback prompt waits, and document the feedback + side-panel-fallback commands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cover the browser-API-free logic surfaced by the coverage audit: opencode-browser (61% → 76% statements): - schema.test: toJsonSchema across all field types, optional/required, enums, nested objects, recursive array items. - logging.test: level filtering, console-channel routing, secret redaction, fromOpenCodeLogLevel mapping. - catalog.test: every tool's params()/result() exercised (catalog 50% → 97%) — valid NeutralResult + no throw on empty executor data; action/group integrity. opencode-browser-mcp: selectTools gates the interactive feedback tool. browser-extension: - extract the side-panel coordinate math into a pure lib/annotate.ts (toNatural / rectFromCorners / naturalRect) and refactor ScreenshotAnnotator to use it; annotate.test covers the mapping. - utils.test: cn class-merge + timeAgo buckets. Per the repo convention, chrome.*/DOM/React glue stays manually verified — that's the jsdom-harness work tracked as the post-merge follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Push the unit-testable surface hard using the existing dependency-injection seams (no browser/DOM needed) — the bulk of the codebase, not just leaf helpers: opencode-browser (62% → 91% statements): - agent-client (50→84%): connect/ready handshake, result correlation, error-code propagation, unknown-id, release/stop/onClose, abort while queued, not-connected. - broker (79→91%): ping→pong, group-event routing vs broadcast, cookies global action, ambiguous/unknown target, executor reconnect-replace, aggregate tabs, stop() rejects pending. - opencode (0→84%): createBrowserPlugin via injected fake transport + client — default/opt-in group registration, disabled path, token advertise (generated vs explicit), config hook. token-file vi.mock'd so it never touches the real shared bridge.json. - token-file (56→91%): resolveSharedToken explicit/file/generated/port-mismatch. - protocol (89→96%): hello/result/error frame builders, pong, malformed frames. opencode-browser-mcp (server + render now 100%): createMcpServer driven over an in-memory transport with a real MCP Client — group-filtered listing, tool-call routing through send(), feedback per-command timeout, send-rejection → isError, unknown tool. (The remaining package % is the stdio bin entry, not unit-testable.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add @vitest/coverage-v8, a `coverage` script per package + root `pnpm coverage` (pnpm -r --if-present coverage), and a v8 coverage block with thresholds in each vitest.config.ts. CI's Test step becomes "Test + coverage" (runs tests AND fails below threshold); the pre-push gate uses `pnpm coverage`. Thresholds sit a few points below current so a regression fails without exact-match churn: - opencode-browser 88/74/82/88 (S/B/F/L) - opencode-browser-mcp 95/65/95/95 — excludes the stdio bin (mcp.ts) + index re-export, so the bar applies to the real logic (server + render). - opencode-oauth2 73/64/70/73 models-info 70/68/60/70 ratelimit 68/72/55/68 - browser-extension floor is intentionally low (3/3/2/3) — chrome/DOM/React glue is verified manually; raised once the fake-browser harness lands (phase 2). Docs: CLAUDE.md commands + pre-push gate updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(phase 2) (#50) * test(browser-extension): fake-chrome harness + background unit tests (4% → 14%) Phase 2 (post-merge follow-up to #49): a dependency-free fake of the chrome.* surface the background uses, wired as a vitest setupFile, so the extension's background logic is unit-testable under the node environment. - test/helpers/fake-chrome.ts: stable singleton fake (runtime message bus, action/tabs/windows/scripting/sidePanel/cookies) with inspectable state + configurable failure modes; identity never changes so module-level captures (const sidePanelApi = chrome.sidePanel) stay valid across resets. - feedback.test: startFeedback orchestration — result correlation, id filtering, attention badge/clear, cancel, overlay→side-panel fallback, capture-fail error, and timeout (fake timers). - feedback-side-panel.test: session store — capture+enable, screenshot-fail, clear restores popup behavior + retires the panel, non-matching-id no-op. - command-router.test: registry/executor-backed actions, page-action bridges (snapshot/get_text/get_html via the fake scripting), cookies, unknown-action, and the cancel registry (db mocked). request_feedback drives end to end. Extension coverage floor raised 3% → 13%. Remaining (still climbing): bridge-client, group-registry, page-actions (jsdom), and the React panels. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(browser-extension): bridge-client + group-registry unit tests (14% → 25%) Continue phase 2 on the fake-chrome harness (no new deps): - fake-chrome: enrich tabs (create/remove/goBack/goForward/reload; get now returns a `complete` tab so group-registry's waitForComplete doesn't spin). - bridge-client.test: fake WebSocket + mocked setStatus — hello handshake with role/identity, ready→connected, command→correlated result, handler-throw→error result, cancel→onCancel, ping→pong, disconnect, and the no-token guard. - group-registry.test: mocked Dexie table + fake chrome.tabs — resolveTab isolation guard, open/navigate/list/close/forget lifecycle, foreign-tab rejection, and hydrate from the db. Extension coverage floor raised 13% → 23%. Remaining: page-actions (jsdom) and the React panels (jsdom + testing-library). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Adds a human-in-the-loop feedback channel to the browser bridge: the agent can ask the user to mark up the live page (click an element, box a region, drop a comment, confirm) and get back structured annotations resolved to element refs. On pages where the overlay can't be injected, it falls back to a docked side panel over a screenshot.
Design doc:
plans/ui-feedback.md.Locked decisions
chrome.sidePanel/ Firefoxsidebar_action) for overlay-blocked pagesbrowser_request_feedbacktool with amodeenuminteractivetool group (off by default, likedebug)maxCommandMsceiling = 10 minPhases
CancelFrame(teardown on abort / per-command timeout / agent disconnect) + per-command timeout viaCommandFrame.timeoutMs, clamped tomaxCommandMs; guest backstop atrequested + grace.interactivegroup +browser_request_feedback(confirm/choose/point); branded in-page overlay;data-ocb-refresolution; badge + tab-focus attention.element/region/commentmodes.openPanelOnActionClickonly while a request is pending (preserves the popup); shares the overlay's result/cancel path.Deferred (deliberately — see plan doc)
NeutralResultis single-kind (text|json|image); resolved refs are more actionable than burned-in pixels.Notes
confirm/choosework fully there.Tests
Broker cancel/timeout/clamp, protocol cancel round-trip, agent-client
timeoutMsforwarding + backstop, catalog gating + all mode summaries + error-vs-timeout. The side panel UI / chrome-API glue isn't unit-tested (DOM + extension APIs); the catalog-level contract is.Pre-push gate green:
pnpm -r build && typecheck && test && lint && format:check.🤖 Generated with Claude Code