VoiceBridge

Speak any language. In your own voice.

Real-time voice translation with a virtual microphone — works in every meeting app.

Quick Start · Architecture · Setup Guide · API Reference

English → Japanese → Indonesian → Russian → Korean , real-time, in the speaker's own cloned voice.

Screenshots

Demo

2026-04-23.01-23-44.mov

One More Thing.

You're in a meeting with colleagues in Tokyo, clients in São Paulo, and partners in Berlin. You speak Indonesian. They hear you — fluently, naturally, instantly — in Japanese, Portuguese, and German. In your voice.

Not a robotic translation. Not a subtitle at the bottom of the screen. Not a five-second delay while some server thinks about it.

You. Speaking their language. In real time. In your own voice.

VoiceBridge captures your microphone, transcribes your speech, translates it through an LLM, clones your voice, and outputs the translated audio through a virtual microphone — so any meeting app hears the translated version. Other participants don't install anything. They don't configure anything. They just hear you, speaking their language, as if you always could.

Prerequisites

Requirement	macOS	Ubuntu/Linux	Windows
Node.js 18+	nodejs.org	`sudo apt install nodejs npm`	nodejs.org
ffmpeg	`brew install ffmpeg`	`sudo apt install ffmpeg`	ffmpeg.org/download
Homebrew	brew.sh	—	—
PulseAudio/PipeWire	—	Pre-installed on Ubuntu 22.04+	—
ElevenLabs API key	elevenlabs.io	elevenlabs.io	elevenlabs.io
LLM API key	openrouter.ai / openai.com / anthropic.com	same	same

ffmpeg is required for real-time mic capture and virtual mic audio output. Without it, VoiceBridge falls back to a silent mock (no audio).

The Pipeline

  ┌─────────┐    ┌───────────┐    ┌─────────────┐    ┌───────────┐    ┌──────────────┐
  │  Your    │───▶│ Transcribe│───▶│  Translate   │───▶│ Your Clone│───▶│  Virtual Mic │
  │  Voice   │    │  (Scribe) │    │   (LLM)      │    │  Voice    │    │  "VoiceBridge│
  │  16kHz   │    │  150ms    │    │   300ms      │    │  75ms     │    │   Mic"       │
  └─────────┘    └───────────┘    └─────────────┘    └───────────┘    └──────────────┘

Five stages. Under 1.5 seconds. Works everywhere.

Stage	What Happens	Technology	Latency
Capture	Real mic audio captured via ffmpeg	avfoundation (macOS) / pulse (Linux) / dshow (Windows)	10ms
Transcribe	Speech becomes text in real-time	ElevenLabs Scribe v2 Realtime	150ms
Translate	Text translated token-by-token	OpenAI / Anthropic / OpenRouter	300ms
Synthesize	Translated text becomes speech in your voice	ElevenLabs Flash v2.5 TTS	75ms
Output	Translated audio written to virtual mic	ffmpeg → BlackHole / PulseAudio / VB-CABLE	10ms

Architecture

┌──────────────────────────────────────────────────────┐
│                 Electron Desktop App                  │
│                                                      │
│  ┌─────────────────┐   ┌─────────────────────────┐  │
│  │  Main Process    │   │  Renderer (Preact)       │  │
│  │  Node.js + N-API │◄─►│  Nothing Design System   │  │
│  │                  │IPC│                           │  │
│  │  • Pipeline      │   │  • Main Window (360×480) │  │
│  │  • Audio Router  │   │  • System Tray           │  │
│  │  • Settings      │   │  • Settings View         │  │
│  │  • Driver Mgmt   │   │  • Debug Log             │  │
│  └────────┬─────────┘   └─────────────────────────┘  │
│           │                                           │
│  ┌────────▼─────────┐                                 │
│  │  Audio I/O        │                                 │
│  │  (ffmpeg)         │                                 │
│  │                    │                                 │
│  │  • Mic Capture     │                                 │
│  │  • Virtual Mic Out │                                 │
│  │  • Resampling      │                                 │
│  └────────┬───────────┘                                 │
└───────────┼─────────────────────────────────────────────┘
            │
┌───────────▼─────────────────────────────────────────────┐
│                    OS Audio Layer                        │
│                                                         │
│  ┌────────────┐   ┌─────────────────────┐               │
│  │ Real Mic    │   │ "VoiceBridge Mic"   │               │
│  │ (hardware)  │   │ (virtual driver)    │               │
│  └────────────┘   └──────────┬──────────┘               │
│                              │                           │
│                   ┌──────────▼──────────┐               │
│                   │  Any Meeting App     │               │
│                   │  Teams / Zoom / Meet │               │
│                   │  Discord / Slack     │               │
│                   └─────────────────────┘               │
└─────────────────────────────────────────────────────────┘

Virtual Mic Driver (Per OS)

OS	What Gets Installed	How
macOS	BlackHole 2ch	`brew install blackhole-2ch`
Ubuntu/Linux	PulseAudio/PipeWire null sink	`pactl load-module module-null-sink`
Windows	VB-CABLE	Manual download + Run as Administrator

Features

Real-Time Voice Translation

Speak naturally. Your words are transcribed, translated, and re-spoken in your cloned voice — all while you're still finishing your sentence.

Push-to-Talk

Hold SPACE or the on-screen button to talk. Each press is an independent utterance — no accumulation, no feedback loops, no background noise pickup.

Your Voice. Every Language.

Record 30 seconds. VoiceBridge clones your voice. Now you speak 90+ languages and it still sounds like you.

Works With Everything

Teams. Zoom. Google Meet. Discord. Slack. FaceTime. WhatsApp. Any app that uses a microphone.

Nothing Design Language

OLED blacks. Space Mono labels. Mechanical toggles. System tray app that stays out of your way.

Quick Start

1. Install prerequisites

# macOS
brew install ffmpeg sox

# Ubuntu/Debian
sudo apt install ffmpeg sox

# Windows — download from https://ffmpeg.org/download.html

2. Clone and install

git clone https://github.com/AlleyBo55/VoiceBridge.git
cd VoiceBridge/desktop
npm install

3. Run

npm run dev

4. First launch

VoiceBridge walks you through setup:

Prerequisites — checks for ffmpeg, sox, and virtual mic driver. One-click install for each.
API Keys — enter your ElevenLabs key and LLM key. Keys are validated before saving.
Voice Clone — record 30+ seconds of your voice. Skip to use a default voice.

Keys are encrypted with AES-GCM-256 and stored only on your device. VoiceBridge has no server.

5. Use it

Open any meeting app → select "BlackHole 2ch" as your microphone
Toggle translation on in VoiceBridge
Hold SPACE and speak — other participants hear your translated voice

90+ Languages

Input: Every language ElevenLabs Scribe supports. Auto-detect is default. Output: Every language ElevenLabs TTS supports. Any-to-any. No restrictions.

Tech Stack

Layer	Choice	Why
App Shell	Electron	Cross-platform desktop, native addon support
UI	Preact + CSS Custom Properties	3KB gzipped, Nothing design system
Audio I/O	ffmpeg	Real mic capture + virtual mic output
Virtual Mic	BlackHole / PulseAudio / VB-CABLE	OS-level virtual audio device
STT	ElevenLabs Scribe v2 Realtime	150ms latency, 90+ languages
TTS	ElevenLabs Flash v2.5	75ms latency, voice cloning
Translation	OpenAI / Anthropic / OpenRouter	Streaming, 200+ models
Testing	Vitest + fast-check	Property-based correctness

Privacy

Audio is streamed, never stored
API keys encrypted with AES-GCM-256
No analytics. No tracking. No telemetry.
No embedded keys — the build ships empty
Panic button (Ctrl/Cmd+Shift+X) kills everything instantly

Keyboard Shortcuts

Shortcut	Action
`Space`	Push-to-talk (hold to speak)
`Ctrl/Cmd+Shift+T`	Toggle translation
`Ctrl/Cmd+Shift+G`	Toggle Ghost Mode
`Ctrl/Cmd+Shift+X`	Panic stop

Development

cd desktop
npm install          # Install dependencies
npm run dev          # Build + launch Electron with hot-reload
npm run test         # Run 42 property-based tests

Project Structure

desktop/
├── src/
│   ├── main/           # Electron main process
│   │   ├── main.ts             # Entry, tray, window, IPC
│   │   ├── desktop-pipeline.ts # Mic → STT → LLM → TTS → BlackHole
│   │   ├── audio-router.ts     # VAD, noise gate, routing
│   │   ├── driver-installer.ts # Virtual mic driver install
│   │   └── ...
│   ├── native/         # ffmpeg audio I/O
│   ├── preload/        # Security boundary
│   ├── renderer/       # Preact UI
│   └── shared/         # Types, platform utils
└── tests/properties/   # Property-based tests

Built With Spec-Driven Development

This project was built using Kiro's spec-driven development — requirements → design → implementation, systematically.

The Specs

Phase 1 — Chrome Extension

Requirements · Design · Tasks

Phase 2 — Pipeline Hardening

Requirements · Design · Tasks

Phase 3 — Desktop App

Requirements · Design · Tasks

License

MIT — use it, fork it, ship it.

"The people who are crazy enough to think they can change the world are the ones who do."

Built for ElevenLabs × Kiro Hackathon
ElevenLabs · Kiro · #ElevenHacks · #CodeWithKiro

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
.kiro		.kiro
.vscode		.vscode
desktop		desktop
docs		docs
public		public
reel		reel
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

VoiceBridge

Screenshots

Demo

One More Thing.

Prerequisites

The Pipeline

Architecture

Virtual Mic Driver (Per OS)

Features

Real-Time Voice Translation

Push-to-Talk

Your Voice. Every Language.

Works With Everything

Nothing Design Language

Quick Start

1. Install prerequisites

2. Clone and install

3. Run

4. First launch

5. Use it

90+ Languages

Tech Stack

Privacy

Keyboard Shortcuts

Development

Project Structure

Built With Spec-Driven Development

The Specs

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages