voice-agent-kit

Transport layer for real-time voice AI agents — handles the full pipeline from Twilio audio to STT to MCP agent to TTS back to audio, with strict latency budgets and provider-agnostic interfaces.

This monorepo provides the pipeline orchestration, telephony integration, provider adapters, and MCP client needed to build production voice agents. It is the transport, not the brain — agent logic lives in an MCP server.

Features

<800ms pipeline — End-to-end latency from end-of-speech to first audio byte with per-stage budgets
Provider-agnostic STT — Deepgram, AWS Transcribe, and Google Cloud Speech-to-Text adapters with a unified interface
Provider-agnostic TTS — Deepgram Aura, AWS Polly, and Google Cloud Text-to-Speech adapters with cancelable streaming
MCP client — JSON-RPC 2.0 client with tool discovery, retry with backoff, and TTS-optimized response sanitization
Twilio Media Streams — Bidirectional WebSocket handler with barge-in detection, mark tracking, and base64 audio encoding
Session management — Multi-turn conversation state with TTL expiry, turn history, and automatic cleanup
Latency enforcement — Per-stage timing with hard caps, overflow detection, and OpenTelemetry metrics
Observability — OpenTelemetry tracing spans, histograms, and counters for every pipeline stage
Dual ESM/CJS — Every package ships both import and require entry points

Installation

Using the packages

Packages are published under the @reaatech scope and can be installed individually:

# Core pipeline, session management, config, and types
pnpm add @reaatech/voice-agent-core

# Speech-to-text provider adapters
pnpm add @reaatech/voice-agent-stt

# Text-to-speech provider adapters
pnpm add @reaatech/voice-agent-tts

# MCP client wrapper
pnpm add @reaatech/voice-agent-mcp-client

# Twilio Media Streams handler
pnpm add @reaatech/voice-agent-telephony

Contributing

# Clone the repository
git clone https://github.com/reaatech/voice-agent-kit.git
cd voice-agent-kit

# Install dependencies
pnpm install

# Build all packages
pnpm build

# Run the test suite
pnpm test

# Run linting
pnpm lint

Quick Start

Wire up a voice agent pipeline with Deepgram for STT/TTS and a custom MCP endpoint:

import { defineConfig, createPipeline, initializeSessionManager, createLatencyBudget, LatencyBudgetEnforcer } from '@reaatech/voice-agent-core';
import { DeepgramSTTProvider } from '@reaatech/voice-agent-stt';
import { DeepgramTTSProvider } from '@reaatech/voice-agent-tts';
import { MCPClient } from '@reaatech/voice-agent-mcp-client';
import { createTwilioHandler } from '@reaatech/voice-agent-telephony';

const config = defineConfig({
  stt: { provider: 'deepgram', apiKey: process.env.DEEPGRAM_API_KEY, sampleRate: 8000 },
  tts: { provider: 'deepgram', apiKey: process.env.DEEPGRAM_API_KEY, voice: 'asteria' },
  mcp: { endpoint: process.env.MCP_ENDPOINT, timeout: 400 },
  latency: { total: { target: 800, hardCap: 1200 }, stages: { stt: 200, mcp: 400, tts: 200 } },
  session: { ttl: 3600, history: { maxTurns: 20, maxTokens: 4000 } },
  bargeIn: { enabled: true, minSpeechDuration: 300, confidenceThreshold: 0.7, silenceThreshold: 0.3 },
});

const pipeline = createPipeline({
  config,
  sessionManager: initializeSessionManager({ defaultTTL: 3600, maxTurns: 20, maxTokens: 4000 }),
  latencyEnforcer: new LatencyBudgetEnforcer(config.latency),
  sttProvider: new DeepgramSTTProvider(),
  ttsProvider: new DeepgramTTSProvider(),
  mcpClient: new MCPClient({ endpoint: config.mcp.endpoint }),
});

// In your WebSocket server:
const handler = createTwilioHandler({ bargeInEnabled: true });
handler.on('audio:received', (chunk) => pipeline.processAudioChunk(sessionId, chunk));
handler.on('barge-in:detected', () => pipeline.bargeIn(sessionId));
pipeline.on('pipeline:tts:chunk', ({ data }) => handler.sendAudio(data.chunk));

See the examples/ directory for complete working samples, including RAG voice agents, multi-agent orchestration, and minimal echo testing.

Packages

Package	Description
`@reaatech/voice-agent-core`	Pipeline orchestrator, session management, latency enforcement, config, and types
`@reaatech/voice-agent-stt`	Speech-to-text provider interface with Deepgram, AWS, and Google adapters
`@reaatech/voice-agent-tts`	Text-to-speech provider interface with Deepgram, AWS, and Google adapters
`@reaatech/voice-agent-mcp-client`	JSON-RPC 2.0 MCP client with tool discovery and response sanitization
`@reaatech/voice-agent-telephony`	Twilio Media Streams WebSocket handler with barge-in detection

Latency Budget

Stage	Budget	Notes
STT	200ms	Deepgram nova-2 typically delivers well under budget
MCP	400ms	Agent round-trip; depends on endpoint complexity
TTS	200ms	Deepgram Aura first-byte is usually under 100ms
Total	800ms	Hard cap at 1200ms triggers budget-exceeded metrics

Pipeline Flow

Twilio WebSocket  →  AudioChunk  →  STT  →  Utterance  →  MCP  →  AgentResponse  →  TTS  →  AudioChunk  →  Twilio

Documentation

ARCHITECTURE.md — System design, package relationships, and pipeline internals
AGENTS.md — Coding conventions and development guidelines
CONTRIBUTING.md — Contribution workflow, adding providers, and release process
LATENCY_BUDGET.md — Per-provider latency characteristics and tuning

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.changeset		.changeset
.github		.github
docs		docs
examples		examples
infra		infra
packages		packages
skills		skills
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
docker-compose.yml		docker-compose.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
tsconfig.typecheck.json		tsconfig.typecheck.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-agent-kit

Features

Installation

Using the packages

Contributing

Quick Start

Packages

Latency Budget

Pipeline Flow

Documentation

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voice-agent-kit

Features

Installation

Using the packages

Contributing

Quick Start

Packages

Latency Budget

Pipeline Flow

Documentation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages