Free Voice Messages for AI Agents
LocalTTS lets your OpenClaw agent send voice messages without paying for cloud TTS APIs. Instead of routing text to expensive cloud services (OpenAI, ElevenLabs, etc.), this runs locally on your host—completely free.
Most AI platforms charge extra for voice:
- OpenAI's TTS: $0.015 per 1K characters
- ElevenLabs: Limited free tier, then paid
- Cloud providers: Monthly subscriptions, usage caps
LocalTTS is the free alternative. Run synthesis locally, pay nothing, speak unlimited.
This isn't about "pro audio quality"—it's about enabling voice communication without the paywall.
- Local synthesis: Piper TTS runs on your machine
- Zero API costs: No cloud calls, no usage limits
- Good enough quality: Fine for Telegram voice notes, quick replies, notifications
- Messaging ready: Generates MP3s that play natively in Telegram/Discord
- Text comes in → Agent decides to reply with voice
- Piper synthesizes → Local ONNX model generates audio
- ffmpeg converts → WAV → MP3 for delivery
- Message sent → Voice note delivered via Telegram/Discord
piperbinary installedffmpegfor MP3 conversion- Voice models in
/voices/directory
echo "Hey James, here's your update." | ./scripts/speak.shThe output goes to /root/.openclaw/workspace/media/voice.mp3, ready to send.
Piper can struggle with some words. The pronunciation.md file maps problematic terms to phonetic spellings before synthesis:
degrees→de-greeswinds→windzAI→A-I
This fixes robotic artifacts without needing expensive cloud voices.
| Method | Cost Per Message | Monthly (1K msgs) |
|---|---|---|
| OpenAI TTS | ~$0.0015 | ~$45 |
| ElevenLabs | Limited free, then paid | $5-50+ |
| LocalTTS | $0 | $0 |
localtts/
├── README.md # This file
├── SKILL.md # OpenClaw skill documentation
├── pronunciation.md # Word → phonetic mappings
└── scripts/
└── speak.sh # Synthesis pipeline
Free voice for agents. No subscriptions. No limits.