feat(browser): human-in-the-loop UI feedback by stephane-segning · Pull Request #49 · vymalo/opencode-oauth2

stephane-segning · 2026-06-14T15:40:43Z

What

Adds a human-in-the-loop feedback channel to the browser bridge: the agent can ask the user to mark up the live page (click an element, box a region, drop a comment, confirm) and get back structured annotations resolved to element refs. On pages where the overlay can't be injected, it falls back to a docked side panel over a screenshot.

Design doc: plans/ui-feedback.md.

Locked decisions

In-page overlay primary; docked side panel fallback (Chromium chrome.sidePanel / Firefox sidebar_action) for overlay-blocked pages
One browser_request_feedback tool with a mode enum
New opt-in interactive tool group (off by default, like debug)
maxCommandMs ceiling = 10 min

Phases

Phase 0 — infra: CancelFrame (teardown on abort / per-command timeout / agent disconnect) + per-command timeout via CommandFrame.timeoutMs, clamped to maxCommandMs; guest backstop at requested + grace.
Phase 1 — minimal HITL: interactive group + browser_request_feedback (confirm/choose/point); branded in-page overlay; data-ocb-ref resolution; badge + tab-focus attention.
Phase 2 — rich annotation: element / region / comment modes.
Side-panel fallback: screenshot-capture + cross-browser docked panel when the overlay can't inject; toggles openPanelOnActionClick only while a request is pending (preserves the popup); shares the overlay's result/cancel path.

Deferred (deliberately — see plan doc)

Marker-annotated screenshot return — NeutralResult is single-kind (text|json|image); resolved refs are more actionable than burned-in pixels.

Notes

Side-panel point/region return screenshot-pixel coords (no live DOM on restricted pages → no refs); confirm/choose work fully there.
Neither browser lets an extension force-open the panel, so the user opens it from the badge-flagged toolbar icon.

Tests

Broker cancel/timeout/clamp, protocol cancel round-trip, agent-client timeoutMs forwarding + backstop, catalog gating + all mode summaries + error-vs-timeout. The side panel UI / chrome-API glue isn't unit-tested (DOM + extension APIs); the catalog-level contract is.

Pre-push gate green: pnpm -r build && typecheck && test && lint && format:check.

🤖 Generated with Claude Code

Plan for a browser_request_feedback tool that paints an annotation overlay and blocks on the user, returning structured annotations (resolved to element refs). Phased: 0) CancelFrame + per-command timeout infra, 1) confirm/choose/point overlay, 2) region/element/ comment + screenshot + side-panel fallback. Also index multi-client-routing.md in the plans list (was only linked from the browser table row). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds a design document (plans/ui-feedback.md) detailing a human-in-the-loop UI feedback mechanism for the browser automation plugin, as well as updating CLAUDE.md. The reviewer feedback raises important points regarding the design: expanding the command-cancellation frame to support non-interactive commands, addressing Manifest V3 service worker ephemerality with keep-alive and state persistence strategies, and utilizing the side-panel fallback as a secure alternative to in-page overlays to prevent spoofing on hostile pages.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T15:41:59Z

+keeps a small `Map<id, () => void>` of teardown handlers for in-flight interactive commands and runs
+the matching one. Non-interactive commands ignore `cancel` (idempotent no-op). This is generally


Limiting cancel to only interactive commands misses a valuable opportunity. Non-interactive commands like browser_wait (which can have long delays) or slow navigations/page actions could also benefit from cancellation.

Consider designing the CancelFrame handler in the executor to be general-purpose. Any command that returns a promise that can be aborted (e.g., using an AbortController or clearing a setTimeout for wait) should register its teardown/abort function in the Map<id, () => void>. This would allow the agent to cleanly abort hung or slow non-interactive commands as well.

gemini-code-assist · 2026-06-14T15:41:59Z

+- The open port + the bridge WebSocket keep the MV3 worker alive for the wait. Document this as the
+  one place the worker must not be allowed to idle out mid-command.


In Manifest V3 (MV3), service workers are ephemeral and can be terminated by the browser even if there is an open WebSocket or chrome.runtime port, especially over longer durations (up to the proposed 10-minute maxCommandMs limit). Relying solely on these connections to keep the worker alive is fragile.

To ensure robustness, the design should explicitly address:

Keep-alive mechanisms: Implementing a proactive keep-alive strategy (e.g., periodic pings/messages over the port/WebSocket, or using chrome.alarms to wake up or reset the idle timer).

State persistence / Reconnection: How the extension recovers if the service worker is terminated and restarted while a human is still interacting with the overlay (e.g., persisting active command IDs and port states in chrome.storage.local so the resurrected worker can re-associate with the active overlay).

gemini-code-assist · 2026-06-14T15:41:59Z

+- The overlay must be **unmistakably branded** as opencode-browser and **un-spoofable** by the page
+  (so a malicious page can't fake the prompt to harvest a click), and always dismissible.


While branding the in-page overlay is important, an in-page overlay injected into the page's DOM (even within a closed Shadow DOM) can never be fully "un-spoofable" or secure against a truly hostile page. A malicious page can still overlay its own elements on top of the extension's overlay, manipulate the z-index, intercept pointer events, or attempt clickjacking.

For high-security contexts or hostile pages, the design should emphasize that the side-panel fallback (which runs in the extension's privileged context and is completely isolated from the page's DOM/JS) is the only way to guarantee an un-spoofable and secure interaction. Consider recommending the side-panel fallback as the default or automatic choice when navigating pages with restrictive Content Security Policies (CSP) or suspected hostile behavior.

Infra for human-in-the-loop tools that block on a user. Two gaps a long-running command exposes, fixed bridge-wide: 1. CancelFrame (broker→executor): on agent abort, per-command timeout, or the requesting agent disconnecting, the broker now tells the still-connected executor to tear down via a `cancel` frame before rejecting — so a blocking command (e.g. an open overlay) is never orphaned in the page. New `abandon()` helper; wired into the abort handler, timeout handler, and agent-disconnect path. 2. Per-command timeout: ToolSpec.timeoutMs threads through send() → CommandFrame.timeoutMs → broker, overriding the global default and clamped to maxCommandMs (default 10 min). Guest agents forward it on the wire and keep a local backstop timer at requested + grace so the broker (which can cancel the executor) always fires first. Extension: bridge-client handles inbound `cancel` → CommandRouter.cancel(id), which runs a teardown registered by cancellable commands (empty until Phase 1). Protocol mirror updated. Tests: broker cancel-on-abort / on-agent-disconnect / no-cancel-on-success, per-command override + maxCommandMs clamp; protocol cancel round-trip; agent-client timeoutMs forwarding + backstop grace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds the opt-in `interactive` tool group and `browser_request_feedback`: the agent asks the human at the browser to respond on the page and blocks until they do (or it times out). Plugin/catalog: - new ToolGroup "interactive" (opt-in, like debug); request_feedback action. - browser_request_feedback spec with mode confirm|choose|point, prompt, options, user timeoutMs (default 120s, clamped 290s). timeoutMs: 300s so the broker deadline backstops the overlay's own countdown. - Annotation / FeedbackResult neutral types; result summarizes the response (confirm → "confirmed", choose → chosen value, point → resolved element ref). Extension: - feedback-overlay.ts: self-contained injected overlay (branded, dismissible, Esc-to-skip). point mode resolves the clicked element to its data-ocb-ref (+ CSS selector + coords). Teardown is a second tiny injection. - feedback.ts: background orchestrator — owns the timeout, correlates the page→bg result message, raises the toolbar badge + focuses the tab, and exposes cancel() for the broker's cancel frame (wired via CommandRouter). - command-router request_feedback case registers the canceller before awaiting. Docs: interactive group + tool reference in browser.md; group/count updates in CLAUDE.md. Tests: gating, action+timeout forwarding, user-timeout clamp, result summaries (confirm/timeout/point-ref). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ed fallback Extends browser_request_feedback beyond confirm/choose/point: - element — hover highlights the element under the cursor; click picks it → { kind:"element", ref, selector, text }. - region — drag a box → { kind:"region", rect, refs } where refs are every data-ocb-ref element inside the box. - comment — click a spot, then type a free-text note attached to the point. Overlay (feedback-overlay.ts): adds hover-highlight, rubber-band region select, and a note panel; shared ref/selector resolution against data-ocb-ref so every spatial mark stays actionable as a ref. Overlay-blocked pages (restricted / CSP) now return { responded:false, error } distinct from a timeout, and the tool result tells the agent to fall back to a screenshot/snapshot instead of re-asking. Deferred, deliberately (documented in plans/ui-feedback.md): marker-annotated screenshot return (NeutralResult can't carry json+image; refs > pixels) and the side-panel-over-screenshot fallback (large separate UI; the error path covers the hostile-page case for now). Docs: new modes + fallback in browser.md; plan doc marked Phases 0–2 done with the deferred-decisions rationale. Tests: element/region/comment summaries + error-vs-timeout shaping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c21c98e1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T16:14:49Z

+      if (!d || d.responded === false || d.timedOut) {
+        return json(
+          { responded: false, timedOut: true, annotations: [] } satisfies FeedbackResult,


Preserve explicit feedback dismissals

When the user actively skips or presses Esc, the overlay returns responded: false without timedOut; this branch treats that the same as a timeout and rewrites the metadata to { timedOut: true }, so the agent is told “No response” even though the human explicitly dismissed the prompt. This loses an important distinction for choose/point/region/comment prompts and can cause the agent to retry or fall back as if the user were absent.

Useful? React with 👍 / 👎.

When browser_request_feedback can't inject its in-page overlay (chrome://, Web Store, CSP-locked origins), the extension now captures a screenshot and routes the request to its own side panel instead of returning an error. - feedback-side-panel.ts: captures the tab via chrome.tabs.captureVisibleTab, stores the pending session, and enables the panel cross-browser (Chromium chrome.sidePanel / Firefox sidebar_action). Neither browser allows force-open, so it raises the badge and — only while a request is pending — flips openPanelOnActionClick so the toolbar click opens the panel, restoring the normal popup behavior on resolve. - new sidepanel entrypoint (React + daisyUI): renders the screenshot with the mode's controls; point/region map clicks to screenshot-pixel coords (no live DOM → no refs on these pages); confirm/choose work fully. Posts the same ocb-feedback-result message as the overlay, so the background's correlation / timeout / cancel path is unchanged. - feedback.ts overlay-failure path now tries the side panel first; only if a screenshot can't be taken does it settle with { error }. - background routes feedback:get-pending; wxt adds the sidePanel permission (Chromium) — WXT maps the entrypoint to side_panel / sidebar_action per target. Docs: side-panel fallback documented in browser.md; plan doc marks it shipped (fallback-only, docked); CLAUDE.md entrypoint/layout note updated. Marker- annotated screenshot remains deliberately deferred (NeutralResult is single-kind; refs > pixels). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

createBunTransport was removed in the move to the ws-based transport (#46), leaving scripts/bridge-harness.mjs importing a missing export. Switch it to createNodeTransport + createNodeAgentSocket (both run under Bun), thread an optional timeoutMs through /cmd so a human-paced request_feedback prompt waits, and document the feedback + side-panel-fallback commands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Cover the browser-API-free logic surfaced by the coverage audit: opencode-browser (61% → 76% statements): - schema.test: toJsonSchema across all field types, optional/required, enums, nested objects, recursive array items. - logging.test: level filtering, console-channel routing, secret redaction, fromOpenCodeLogLevel mapping. - catalog.test: every tool's params()/result() exercised (catalog 50% → 97%) — valid NeutralResult + no throw on empty executor data; action/group integrity. opencode-browser-mcp: selectTools gates the interactive feedback tool. browser-extension: - extract the side-panel coordinate math into a pure lib/annotate.ts (toNatural / rectFromCorners / naturalRect) and refactor ScreenshotAnnotator to use it; annotate.test covers the mapping. - utils.test: cn class-merge + timeAgo buckets. Per the repo convention, chrome.*/DOM/React glue stays manually verified — that's the jsdom-harness work tracked as the post-merge follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Push the unit-testable surface hard using the existing dependency-injection seams (no browser/DOM needed) — the bulk of the codebase, not just leaf helpers: opencode-browser (62% → 91% statements): - agent-client (50→84%): connect/ready handshake, result correlation, error-code propagation, unknown-id, release/stop/onClose, abort while queued, not-connected. - broker (79→91%): ping→pong, group-event routing vs broadcast, cookies global action, ambiguous/unknown target, executor reconnect-replace, aggregate tabs, stop() rejects pending. - opencode (0→84%): createBrowserPlugin via injected fake transport + client — default/opt-in group registration, disabled path, token advertise (generated vs explicit), config hook. token-file vi.mock'd so it never touches the real shared bridge.json. - token-file (56→91%): resolveSharedToken explicit/file/generated/port-mismatch. - protocol (89→96%): hello/result/error frame builders, pong, malformed frames. opencode-browser-mcp (server + render now 100%): createMcpServer driven over an in-memory transport with a real MCP Client — group-filtered listing, tool-call routing through send(), feedback per-command timeout, send-rejection → isError, unknown tool. (The remaining package % is the stdio bin entry, not unit-testable.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add @vitest/coverage-v8, a `coverage` script per package + root `pnpm coverage` (pnpm -r --if-present coverage), and a v8 coverage block with thresholds in each vitest.config.ts. CI's Test step becomes "Test + coverage" (runs tests AND fails below threshold); the pre-push gate uses `pnpm coverage`. Thresholds sit a few points below current so a regression fails without exact-match churn: - opencode-browser 88/74/82/88 (S/B/F/L) - opencode-browser-mcp 95/65/95/95 — excludes the stdio bin (mcp.ts) + index re-export, so the bar applies to the real logic (server + render). - opencode-oauth2 73/64/70/73 models-info 70/68/60/70 ratelimit 68/72/55/68 - browser-extension floor is intentionally low (3/3/2/3) — chrome/DOM/React glue is verified manually; raised once the fake-browser harness lands (phase 2). Docs: CLAUDE.md commands + pre-push gate updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…(phase 2) (#50) * test(browser-extension): fake-chrome harness + background unit tests (4% → 14%) Phase 2 (post-merge follow-up to #49): a dependency-free fake of the chrome.* surface the background uses, wired as a vitest setupFile, so the extension's background logic is unit-testable under the node environment. - test/helpers/fake-chrome.ts: stable singleton fake (runtime message bus, action/tabs/windows/scripting/sidePanel/cookies) with inspectable state + configurable failure modes; identity never changes so module-level captures (const sidePanelApi = chrome.sidePanel) stay valid across resets. - feedback.test: startFeedback orchestration — result correlation, id filtering, attention badge/clear, cancel, overlay→side-panel fallback, capture-fail error, and timeout (fake timers). - feedback-side-panel.test: session store — capture+enable, screenshot-fail, clear restores popup behavior + retires the panel, non-matching-id no-op. - command-router.test: registry/executor-backed actions, page-action bridges (snapshot/get_text/get_html via the fake scripting), cookies, unknown-action, and the cancel registry (db mocked). request_feedback drives end to end. Extension coverage floor raised 3% → 13%. Remaining (still climbing): bridge-client, group-registry, page-actions (jsdom), and the React panels. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(browser-extension): bridge-client + group-registry unit tests (14% → 25%) Continue phase 2 on the fake-chrome harness (no new deps): - fake-chrome: enrich tabs (create/remove/goBack/goForward/reload; get now returns a `complete` tab so group-registry's waitForComplete doesn't spin). - bridge-client.test: fake WebSocket + mocked setStatus — hello handshake with role/identity, ready→connected, command→correlated result, handler-throw→error result, cancel→onCancel, ping→pong, disconnect, and the no-token guard. - group-registry.test: mocked Dexie table + fake chrome.tabs — resolveTab isolation guard, open/navigate/list/close/forget lifecycle, foreign-tab rejection, and hydrate from the db. Extension coverage floor raised 13% → 23%. Remaining: page-actions (jsdom) and the React panels (jsdom + testing-library). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

stephane-segning and others added 3 commits June 14, 2026 17:52

stephane-segning marked this pull request as ready for review June 14, 2026 16:08

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

stephane-segning and others added 5 commits June 14, 2026 18:39

stephane-segning merged commit 5bd2bc1 into main Jun 14, 2026
3 checks passed

stephane-segning mentioned this pull request Jun 14, 2026

test(browser-extension): fake-chrome harness + background unit tests (phase 2) #50

Merged

This was referenced Jun 14, 2026

chore(release): 0.8.0 + first CHANGELOG #52

Merged

feat(opencode): bump @vymalo toolbelt 0.7.3→0.8.0 + register interactive browser group (ADR-0050) ADORSYS-GIS/ai-helm#398

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(browser): human-in-the-loop UI feedback#49

feat(browser): human-in-the-loop UI feedback#49
stephane-segning merged 9 commits into
mainfrom
claude/pedantic-curran-4c514d

stephane-segning commented Jun 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		keeps a small `Map<id, () => void>` of teardown handlers for in-flight interactive commands and runs
		the matching one. Non-interactive commands ignore `cancel` (idempotent no-op). This is generally

		- The open port + the bridge WebSocket keep the MV3 worker alive for the wait. Document this as the
		one place the worker must not be allowed to idle out mid-command.

		- The overlay must be unmistakably branded as opencode-browser and un-spoofable by the page
		(so a malicious page can't fake the prompt to harvest a click), and always dismissible.

Conversation

stephane-segning commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Locked decisions

Phases

Deferred (deliberately — see plan doc)

Notes

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stephane-segning commented Jun 14, 2026 •

edited

Loading