HoldTalk

Push-to-talk dictation for macOS — 100% free, 100% offline.

Hold a key. Speak in Bangla or English. Release. Your words appear at the cursor.

HoldTalk is a Wispr Flow–style dictation tool with none of the strings attached: no API keys, no accounts, no subscriptions, no audio ever leaving your Mac. It runs OpenAI's Whisper model locally via whisper.cpp and pastes the result into whatever app you're typing in — your editor, browser, terminal, anywhere.

Features

🎙️ Hold-to-talk — hold Right Option (⌥), speak, release. Done.
🙌 Hands-free mode — double-tap the hotkey, talk freely, tap once to stop.
🎛️ Dictation pill — floating HUD with a live mic-level waveform and ✕ / ✓ buttons that never steal focus.
🕘 History — every transcript saved locally; pin, copy, multi-select delete.
🌏 Bilingual — speak Bangla → Bangla text, speak English → English text, auto-detected. Includes a fix for Whisper's notorious Bangla→Hindi misdetection.
🔁 Translate mode — hold Right Option + Shift, speak Bangla, get polished English text.
⚡ Fast — the model preloads while you're speaking and stays warm between dictations: ~1.5–3.5s per dictation instead of ~17s cold.
🪶 Featherweight when idle — ~57 MB RAM, 0% CPU. The ~600 MB speech model auto-unloads after a configurable idle period. Built for 8 GB Macs running heavy dev workloads.
⌨️ Customizable hotkeys — rebind both modes from Settings.
🔒 Private by design — everything runs on-device. Nothing is uploaded, ever.

How it works

Hold hotkey ──► whisper-server preloads model (while you speak)
     │
     ▼
AVAudioRecorder ──► 16 kHz mono WAV
     │  release
     ▼
whisper.cpp (large-v3-turbo / medium) ──► transcript
     │
     ▼
Clipboard swap ──► synthetic ⌘V ──► clipboard restored

The Whisper model is never resident while idle. On the first key-press a local whisper-server spawns and loads the model during your speech; it stays warm for fast follow-up dictations, then shuts down after the idle timeout (default 3 min) to return its RAM to the system. If the server isn't available, HoldTalk falls back to one-shot whisper-cli.

Performance (Apple M1, 8 GB)

Path	Latency after release	Idle RAM
Cold `whisper-cli` spawn	~17 s	0
Warm server (HoldTalk default)	~1.5–3.5 s	0 after idle timeout

Installation

curl -fsSL https://raw.githubusercontent.com/DevWizardHQ/HoldTalk/main/scripts/install.sh | bash

The installer adds whisper-cpp via Homebrew, downloads the two models (~1.1 GB total), and installs the latest prebuilt release (checksum-verified) to /Applications/HoldTalk.app. No Xcode needed.

Build from source instead

git clone https://github.com/DevWizardHQ/HoldTalk.git
cd HoldTalk
./setup.sh

setup.sh does the same but builds the app locally (requires Xcode command line tools).

Then:

Open HoldTalk from Launchpad or Spotlight
Grant Microphone and Accessibility permissions (System Settings → Privacy & Security) — the hotkey activates the moment you grant, no relaunch needed
Open any app, hold Right ⌥, speak, release

Usage

Action	Result
Hold Right ⌥ + speak	Transcribe in spoken language (Bangla/English auto)
Hold Right ⌥ ⇧ + speak	Translate speech → English text
Double-tap Right ⌥, speak, tap once	Hands-free dictation
✕ / ✓ on the floating pill	Cancel / finish the current dictation
Menu bar → History	Browse, pin, copy, delete past transcripts
Menu bar → Settings	Rebind hotkeys, keep-warm duration, launch at login

Models

Mode	Model	Size	Notes
Transcribe	`ggml-large-v3-turbo-q5_0`	~574 MB	Best multilingual speed/quality balance
Translate	`ggml-medium-q5_0`	~539 MB	Whisper's built-in any-language → English

Models live in ~/Library/Application Support/HoldTalk/models/ and can also be downloaded from in-app Settings.

Uninstall

curl -fsSL https://raw.githubusercontent.com/DevWizardHQ/HoldTalk/main/scripts/uninstall.sh | bash

Removes the app, downloaded models, history, logs, and preferences. Flags: --keep-data (keep models + history), --with-whisper (also remove the Homebrew whisper-cpp package), -y (skip confirmation).

FAQ

Why not the OpenAI/Groq/Deepgram API? HoldTalk's goal is zero cost and zero cloud. Whisper large-v3-turbo locally on Apple Silicon is fast enough that the API buys you little.

Does it work with languages other than Bangla/English? Yes — Whisper auto-detects ~100 languages. The Bangla→Hindi retry heuristic only kicks in when auto-detect produces Devanagari.

Why does Accessibility permission reset after I rebuild? macOS ties permissions to the code signature. Create the stable local signing certificate described in CONTRIBUTING.md once and the build script signs every build identically — permissions then persist.

My dictation is longer than 15 seconds — is that OK? Yes. The encoder window is tuned to 15 s for speed; longer audio is processed in sequential chunks automatically.

Development

swift build                  # debug build
./scripts/build-app.sh       # release .app bundle in dist/

See CONTRIBUTING.md for project layout and guidelines.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
Resources		Resources
Sources/HoldTalk		Sources/HoldTalk
assets		assets
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HoldTalk

Features

How it works

Performance (Apple M1, 8 GB)

Installation

Usage

Models

Uninstall

FAQ

Development

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HoldTalk

Features

How it works

Performance (Apple M1, 8 GB)

Installation

Usage

Models

Uninstall

FAQ

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages