feat(desktop): real-time voice dictation in composer by klopez4212 · Pull Request #1511 · block/buzz

klopez4212 · 2026-07-04T14:38:55Z

Summary

Adds real-time voice dictation to the message composer using OpenAI's Realtime API over WebRTC.

How it works

User clicks the mic button in the composer toolbar
Mic audio is captured immediately via an AudioWorklet (24kHz PCM)
Desktop requests an ephemeral client secret from the relay (POST /transcribe/session)
WebRTC peer connection streams audio directly to OpenAI
Transcript deltas stream back and merge into the composer in real-time
User clicks mic again to stop, or says "submit" to auto-send

Relay changes (`crates/buzz-relay`)

POST /transcribe/session — mints an ephemeral OpenAI Realtime client secret
GET /transcribe/status — returns whether transcription is configured
Gated by BUZZ_OPENAI_API_KEY env var — no key = mic button hidden (graceful degradation)
Added reqwest as a direct dependency for the upstream HTTP call

Desktop changes (`desktop/src/features/dictation/`)

File	Purpose
`lib/realtimeBufferWorklet.ts`	AudioWorklet: resample mic → 24kHz 16-bit PCM
`lib/realtimeAudio.ts`	WebRTC peer connection, audio buffer flush, transcript merge
`lib/voiceInput.ts`	Text merging logic, auto-submit phrase detection
`api/transcribeSession.ts`	HTTP client for relay transcribe endpoints
`hooks/useRealtimeDictation.ts`	Core WebRTC dictation hook
`hooks/useDictation.ts`	Higher-level hook with auto-submit
`hooks/useComposerDictation.ts`	Thin wrapper pre-wired for MessageComposer state
`ui/DictationButton.tsx`	Mic button (rounded-full, red pulse when recording)

Integrated into MessageComposer via the toolbar extraActions slot.

Configuration

# .env (relay)
BUZZ_OPENAI_API_KEY=sk-...          # required — enables dictation
BUZZ_TRANSCRIPTION_MODEL=whisper-1  # optional — defaults to whisper-1

Design decisions

Relay-proxied secrets — the relay holds the API key and mints short-lived client secrets. The frontend never sees the real key.
Audio buffering — PCM is buffered during the ~1-2s WebRTC setup so no audio is lost.
OSS-friendly — no Block-specific URLs. Self-hosters configure their own key; absent key = feature hidden.
No new crates — uses existing reqwest workspace dep.

Adds dictation support using OpenAI's Realtime API over WebRTC: Relay: - New /transcribe/status and /transcribe/session endpoints - BUZZ_OPENAI_API_KEY env var gates the feature (hidden when absent) - Proxies ephemeral client-secret minting from OpenAI Desktop: - New features/dictation module with: - AudioWorklet for 24kHz PCM capture + buffering - WebRTC peer connection to OpenAI Realtime API - Real-time transcript merging into composer - Auto-submit on trigger phrase ('submit') - Mic button in composer toolbar (red pulse when recording) - Integrated into MessageComposer via useComposerDictation hook

New public API needs doc comments — clippy runs with -D missing-docs, so TranscribeStatus and TranscribeSession were failing the Rust Lint gate.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c12132e30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

klopez4212

(Review withdrawn — findings are being addressed directly on the branch.)

Both /transcribe/status and /transcribe/session now require NIP-98 authentication and relay membership (with NIP-OA fallback), matching the security posture of /events, /query, and /count. Promotes verify_bridge_auth, check_nip98_replay, and nip98_expected_url to pub(crate) so the transcribe module can reuse them without duplication.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 195d741e65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e874a53dbf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Add nonce tag to NIP-98 auth events to prevent replay rejection when multiple components call /transcribe/status in the same second. - Wire dictation text into both the Tiptap editor and contentRef via setComposerContent + setEditorContentRef, so dictated text actually appears in the composer and is serialized on submit. - Call submitMessageRef.current() synchronously in onSend instead of via queueMicrotask, ensuring the editor content is consumed before the subsequent setText('') clears it. - Replace naive append-based transcript merging with segment-aware state tracking (TranscriptSegmentState). Delta events accumulate into pendingDelta; completed events replace accumulated deltas with the finalized text, preventing duplication.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ebcd42e0b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Switch relay from /v1/realtime/sessions to /v1/realtime/client_secrets with the wrapped { session: { ... } } request shape per OpenAI's current WebRTC guide. The old endpoint returns non-2xx, breaking dictation. - Redesign TranscriptSegmentState to track per-item segments keyed by item_id. Completed events for different turns can arrive out of order; reconciling by item_id preserves utterance ordering and prevents text reordering or partial-turn drops during fast consecutive speech.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9cc335a8e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-04T15:54:25Z

+            "session": {
+                "model": "gpt-4o-mini-realtime-preview",
+                "modalities": ["text"],
+                "input_audio_transcription": {
+                    "model": model,


Create a typed transcription session

Fresh evidence after the endpoint fix is that the payload still uses the legacy realtime fields here (model/modalities/input_audio_transcription). OpenAI's current client-secrets reference defines session as either a typed realtime or transcription session, with transcription config under audio.input.transcription (https://developers.openai.com/api/reference/resources/realtime/subresources/client_secrets/methods/create), and the realtime transcription guide uses type: "transcription"; with this shape, configured relays will either get a rejected client-secret request or a realtime session with transcription off, so mic starts never produce transcript events.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-04T15:54:25Z

+  submitMessageRef,
+}: UseComposerDictationOptions) {
+  return useDictation({
+    text: contentRef.current,


Sync editor content before merging dictation

In normal typing, MessageComposer only updates the empty/non-empty state from Tiptap onUpdate; contentRef.current is synced lazily for sends/drafts. Passing that stale ref as the dictation source means a user who types or pastes text and then starts dictation has the first transcript merged against the old ref value, and setEditorContentRef.current(text) replaces the editor, dropping the manually entered prefix.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-04T15:54:25Z

+      // holds the dictated text.
+      submitMessageRef.current();
+    },
+    sendDisabled: disabled || isSending,


Respect blocked sends before auto-clearing

Even with the synchronous submit fix, sendDisabled only reflects disabled || isSending here. If the user says “submit” while an attachment is still uploading, MessageComposer.submitMessage returns early because isUploadingRef.current is true, but useDictation then clears the composer after onSend; include the same send blockers (at least uploads/mention preparation) or only clear after the send actually proceeds.

Useful? React with 👍 / 👎.

klopez4212 and others added 3 commits July 4, 2026 15:02

style: make dictation button rounded-full to match send button

6c12132

fix(relay): add doc comments to transcribe response structs

a7935ea

New public API needs doc comments — clippy runs with -D missing-docs, so TranscribeStatus and TranscribeSession were failing the Rust Lint gate.

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread crates/buzz-relay/src/api/transcribe.rs Outdated

Comment thread desktop/src/features/dictation/hooks/useComposerDictation.ts Outdated

Comment thread desktop/src/features/dictation/hooks/useDictation.ts

klopez4212 commented Jul 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread desktop/src/features/dictation/api/transcribeSession.ts Outdated

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread desktop/src/features/dictation/api/transcribeSession.ts

Comment thread desktop/src/features/dictation/lib/realtimeAudio.ts Outdated

klopez4212 force-pushed the kennylopez-dictation branch from e874a53 to ebcd42e Compare July 4, 2026 15:35

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread crates/buzz-relay/src/api/transcribe.rs Outdated

Comment thread desktop/src/features/dictation/lib/realtimeAudio.ts Outdated

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

Uh oh!

Conversation

klopez4212 commented Jul 4, 2026

Summary

How it works

Relay changes (crates/buzz-relay)

Desktop changes (desktop/src/features/dictation/)

Configuration

Design decisions

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

klopez4212 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Relay changes (`crates/buzz-relay`)

Desktop changes (`desktop/src/features/dictation/`)

klopez4212 left a comment •

edited

Loading