Speak any language. In your own voice.
Real-time voice translation with a virtual microphone — works in every meeting app.
Quick Start · Architecture · Setup Guide · API Reference
English → Japanese → Indonesian → Russian → Korean , real-time, in the speaker's own cloned voice.
2026-04-23.01-23-44.mov
You're in a meeting with colleagues in Tokyo, clients in São Paulo, and partners in Berlin. You speak Indonesian. They hear you — fluently, naturally, instantly — in Japanese, Portuguese, and German. In your voice.
Not a robotic translation. Not a subtitle at the bottom of the screen. Not a five-second delay while some server thinks about it.
You. Speaking their language. In real time. In your own voice.
VoiceBridge captures your microphone, transcribes your speech, translates it through an LLM, clones your voice, and outputs the translated audio through a virtual microphone — so any meeting app hears the translated version. Other participants don't install anything. They don't configure anything. They just hear you, speaking their language, as if you always could.
| Requirement | macOS | Ubuntu/Linux | Windows |
|---|---|---|---|
| Node.js 18+ | nodejs.org | sudo apt install nodejs npm |
nodejs.org |
| ffmpeg | brew install ffmpeg |
sudo apt install ffmpeg |
ffmpeg.org/download |
| Homebrew | brew.sh | — | — |
| PulseAudio/PipeWire | — | Pre-installed on Ubuntu 22.04+ | — |
| ElevenLabs API key | elevenlabs.io | elevenlabs.io | elevenlabs.io |
| LLM API key | openrouter.ai / openai.com / anthropic.com | same | same |
ffmpeg is required for real-time mic capture and virtual mic audio output. Without it, VoiceBridge falls back to a silent mock (no audio).
┌─────────┐ ┌───────────┐ ┌─────────────┐ ┌───────────┐ ┌──────────────┐
│ Your │───▶│ Transcribe│───▶│ Translate │───▶│ Your Clone│───▶│ Virtual Mic │
│ Voice │ │ (Scribe) │ │ (LLM) │ │ Voice │ │ "VoiceBridge│
│ 16kHz │ │ 150ms │ │ 300ms │ │ 75ms │ │ Mic" │
└─────────┘ └───────────┘ └─────────────┘ └───────────┘ └──────────────┘
Five stages. Under 1.5 seconds. Works everywhere.
| Stage | What Happens | Technology | Latency |
|---|---|---|---|
| Capture | Real mic audio captured via ffmpeg | avfoundation (macOS) / pulse (Linux) / dshow (Windows) | 10ms |
| Transcribe | Speech becomes text in real-time | ElevenLabs Scribe v2 Realtime | 150ms |
| Translate | Text translated token-by-token | OpenAI / Anthropic / OpenRouter | 300ms |
| Synthesize | Translated text becomes speech in your voice | ElevenLabs Flash v2.5 TTS | 75ms |
| Output | Translated audio written to virtual mic | ffmpeg → BlackHole / PulseAudio / VB-CABLE | 10ms |
┌──────────────────────────────────────────────────────┐
│ Electron Desktop App │
│ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Main Process │ │ Renderer (Preact) │ │
│ │ Node.js + N-API │◄─►│ Nothing Design System │ │
│ │ │IPC│ │ │
│ │ • Pipeline │ │ • Main Window (360×480) │ │
│ │ • Audio Router │ │ • System Tray │ │
│ │ • Settings │ │ • Settings View │ │
│ │ • Driver Mgmt │ │ • Debug Log │ │
│ └────────┬─────────┘ └─────────────────────────┘ │
│ │ │
│ ┌────────▼─────────┐ │
│ │ Audio I/O │ │
│ │ (ffmpeg) │ │
│ │ │ │
│ │ • Mic Capture │ │
│ │ • Virtual Mic Out │ │
│ │ • Resampling │ │
│ └────────┬───────────┘ │
└───────────┼─────────────────────────────────────────────┘
│
┌───────────▼─────────────────────────────────────────────┐
│ OS Audio Layer │
│ │
│ ┌────────────┐ ┌─────────────────────┐ │
│ │ Real Mic │ │ "VoiceBridge Mic" │ │
│ │ (hardware) │ │ (virtual driver) │ │
│ └────────────┘ └──────────┬──────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Any Meeting App │ │
│ │ Teams / Zoom / Meet │ │
│ │ Discord / Slack │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────┘
| OS | What Gets Installed | How |
|---|---|---|
| macOS | BlackHole 2ch | brew install blackhole-2ch |
| Ubuntu/Linux | PulseAudio/PipeWire null sink | pactl load-module module-null-sink |
| Windows | VB-CABLE | Manual download + Run as Administrator |
Speak naturally. Your words are transcribed, translated, and re-spoken in your cloned voice — all while you're still finishing your sentence.
Hold SPACE or the on-screen button to talk. Each press is an independent utterance — no accumulation, no feedback loops, no background noise pickup.
Record 30 seconds. VoiceBridge clones your voice. Now you speak 90+ languages and it still sounds like you.
Teams. Zoom. Google Meet. Discord. Slack. FaceTime. WhatsApp. Any app that uses a microphone.
OLED blacks. Space Mono labels. Mechanical toggles. System tray app that stays out of your way.
# macOS
brew install ffmpeg sox
# Ubuntu/Debian
sudo apt install ffmpeg sox
# Windows — download from https://ffmpeg.org/download.htmlgit clone https://github.com/AlleyBo55/VoiceBridge.git
cd VoiceBridge/desktop
npm installnpm run devVoiceBridge walks you through setup:
- Prerequisites — checks for ffmpeg, sox, and virtual mic driver. One-click install for each.
- API Keys — enter your ElevenLabs key and LLM key. Keys are validated before saving.
- Voice Clone — record 30+ seconds of your voice. Skip to use a default voice.
Keys are encrypted with AES-GCM-256 and stored only on your device. VoiceBridge has no server.
- Open any meeting app → select "BlackHole 2ch" as your microphone
- Toggle translation on in VoiceBridge
- Hold SPACE and speak — other participants hear your translated voice
Input: Every language ElevenLabs Scribe supports. Auto-detect is default. Output: Every language ElevenLabs TTS supports. Any-to-any. No restrictions.
| Layer | Choice | Why |
|---|---|---|
| App Shell | Electron | Cross-platform desktop, native addon support |
| UI | Preact + CSS Custom Properties | 3KB gzipped, Nothing design system |
| Audio I/O | ffmpeg | Real mic capture + virtual mic output |
| Virtual Mic | BlackHole / PulseAudio / VB-CABLE | OS-level virtual audio device |
| STT | ElevenLabs Scribe v2 Realtime | 150ms latency, 90+ languages |
| TTS | ElevenLabs Flash v2.5 | 75ms latency, voice cloning |
| Translation | OpenAI / Anthropic / OpenRouter | Streaming, 200+ models |
| Testing | Vitest + fast-check | Property-based correctness |
- Audio is streamed, never stored
- API keys encrypted with AES-GCM-256
- No analytics. No tracking. No telemetry.
- No embedded keys — the build ships empty
- Panic button (Ctrl/Cmd+Shift+X) kills everything instantly
| Shortcut | Action |
|---|---|
Space |
Push-to-talk (hold to speak) |
Ctrl/Cmd+Shift+T |
Toggle translation |
Ctrl/Cmd+Shift+G |
Toggle Ghost Mode |
Ctrl/Cmd+Shift+X |
Panic stop |
cd desktop
npm install # Install dependencies
npm run dev # Build + launch Electron with hot-reload
npm run test # Run 42 property-based testsdesktop/
├── src/
│ ├── main/ # Electron main process
│ │ ├── main.ts # Entry, tray, window, IPC
│ │ ├── desktop-pipeline.ts # Mic → STT → LLM → TTS → BlackHole
│ │ ├── audio-router.ts # VAD, noise gate, routing
│ │ ├── driver-installer.ts # Virtual mic driver install
│ │ └── ...
│ ├── native/ # ffmpeg audio I/O
│ ├── preload/ # Security boundary
│ ├── renderer/ # Preact UI
│ └── shared/ # Types, platform utils
└── tests/properties/ # Property-based tests
This project was built using Kiro's spec-driven development — requirements → design → implementation, systematically.
Phase 1 — Chrome Extension
- Requirements · Design · Tasks
Phase 2 — Pipeline Hardening
- Requirements · Design · Tasks
Phase 3 — Desktop App
- Requirements · Design · Tasks
MIT — use it, fork it, ship it.
"The people who are crazy enough to think they can change the world are the ones who do."
Built for ElevenLabs × Kiro Hackathon
ElevenLabs · Kiro · #ElevenHacks · #CodeWithKiro



