Push-to-talk dictation for macOS — 100% free, 100% offline.
Hold a key. Speak in Bangla or English. Release. Your words appear at the cursor.
HoldTalk is a Wispr Flow–style dictation tool with none of the strings attached: no API keys, no accounts, no subscriptions, no audio ever leaving your Mac. It runs OpenAI's Whisper model locally via whisper.cpp and pastes the result into whatever app you're typing in — your editor, browser, terminal, anywhere.
- 🎙️ Hold-to-talk — hold Right Option (⌥), speak, release. Done.
- 🙌 Hands-free mode — double-tap the hotkey, talk freely, tap once to stop.
- 🎛️ Dictation pill — floating HUD with a live mic-level waveform and ✕ / ✓ buttons that never steal focus.
- 🕘 History — every transcript saved locally; pin, copy, multi-select delete.
- 🌏 Bilingual — speak Bangla → Bangla text, speak English → English text, auto-detected. Includes a fix for Whisper's notorious Bangla→Hindi misdetection.
- 🔁 Translate mode — hold Right Option + Shift, speak Bangla, get polished English text.
- ⚡ Fast — the model preloads while you're speaking and stays warm between dictations: ~1.5–3.5s per dictation instead of ~17s cold.
- 🪶 Featherweight when idle — ~57 MB RAM, 0% CPU. The ~600 MB speech model auto-unloads after a configurable idle period. Built for 8 GB Macs running heavy dev workloads.
- ⌨️ Customizable hotkeys — rebind both modes from Settings.
- 🔒 Private by design — everything runs on-device. Nothing is uploaded, ever.
Hold hotkey ──► whisper-server preloads model (while you speak)
│
▼
AVAudioRecorder ──► 16 kHz mono WAV
│ release
▼
whisper.cpp (large-v3-turbo / medium) ──► transcript
│
▼
Clipboard swap ──► synthetic ⌘V ──► clipboard restored
The Whisper model is never resident while idle. On the first key-press a
local whisper-server spawns and loads the model during your speech; it stays
warm for fast follow-up dictations, then shuts down after the idle timeout
(default 3 min) to return its RAM to the system. If the server isn't
available, HoldTalk falls back to one-shot whisper-cli.
| Path | Latency after release | Idle RAM |
|---|---|---|
Cold whisper-cli spawn |
~17 s | 0 |
| Warm server (HoldTalk default) | ~1.5–3.5 s | 0 after idle timeout |
curl -fsSL https://raw.githubusercontent.com/DevWizardHQ/HoldTalk/main/scripts/install.sh | bashThe installer adds whisper-cpp via Homebrew, downloads the two models
(~1.1 GB total), and installs the latest prebuilt release (checksum-verified)
to /Applications/HoldTalk.app. No Xcode needed.
Build from source instead
git clone https://github.com/DevWizardHQ/HoldTalk.git
cd HoldTalk
./setup.shsetup.sh does the same but builds the app locally (requires Xcode command
line tools).
Then:
- Open HoldTalk from Launchpad or Spotlight
- Grant Microphone and Accessibility permissions (System Settings → Privacy & Security) — the hotkey activates the moment you grant, no relaunch needed
- Open any app, hold Right ⌥, speak, release
| Action | Result |
|---|---|
| Hold Right ⌥ + speak | Transcribe in spoken language (Bangla/English auto) |
| Hold Right ⌥ ⇧ + speak | Translate speech → English text |
| Double-tap Right ⌥, speak, tap once | Hands-free dictation |
| ✕ / ✓ on the floating pill | Cancel / finish the current dictation |
| Menu bar → History | Browse, pin, copy, delete past transcripts |
| Menu bar → Settings | Rebind hotkeys, keep-warm duration, launch at login |
| Mode | Model | Size | Notes |
|---|---|---|---|
| Transcribe | ggml-large-v3-turbo-q5_0 |
~574 MB | Best multilingual speed/quality balance |
| Translate | ggml-medium-q5_0 |
~539 MB | Whisper's built-in any-language → English |
Models live in ~/Library/Application Support/HoldTalk/models/ and can also be
downloaded from in-app Settings.
curl -fsSL https://raw.githubusercontent.com/DevWizardHQ/HoldTalk/main/scripts/uninstall.sh | bashRemoves the app, downloaded models, history, logs, and preferences.
Flags: --keep-data (keep models + history), --with-whisper (also remove the
Homebrew whisper-cpp package), -y (skip confirmation).
Why not the OpenAI/Groq/Deepgram API? HoldTalk's goal is zero cost and zero cloud. Whisper large-v3-turbo locally on Apple Silicon is fast enough that the API buys you little.
Does it work with languages other than Bangla/English? Yes — Whisper auto-detects ~100 languages. The Bangla→Hindi retry heuristic only kicks in when auto-detect produces Devanagari.
Why does Accessibility permission reset after I rebuild? macOS ties permissions to the code signature. Create the stable local signing certificate described in CONTRIBUTING.md once and the build script signs every build identically — permissions then persist.
My dictation is longer than 15 seconds — is that OK? Yes. The encoder window is tuned to 15 s for speed; longer audio is processed in sequential chunks automatically.
swift build # debug build
./scripts/build-app.sh # release .app bundle in dist/See CONTRIBUTING.md for project layout and guidelines.