Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins by 3clyp50 · Pull Request #1371 · agent0ai/agent-zero

3clyp50 · 2026-03-29T03:22:46Z

Split the legacy core speech stack into two built-in, independently toggleable plugins: _kokoro_tts for TTS and _whisper_stt for STT.

This extraction keeps dependency installation and bootstrap concerns in Docker/bootstrap/preload, while moving speech-specific tooling, APIs, prompts, UI, and runtime behavior into the plugins. Core now exposes engine-agnostic tts-service and stt-service brokers, with browser-native TTS preserved as the fallback when Kokoro is disabled.

Included in this change:

add built-in _kokoro_tts plugin with plugin-owned synth API, config, status UI, and provider registration
add built-in _whisper_stt plugin with plugin-owned transcribe API, mic runtime, device UI, prompt injection, and provider registration
remove legacy core speech APIs/helpers/settings/UI and delete unused webui/js/speech_browser.js
replace the old hardcoded speech settings section with a generic voice surface backed by plugin extensions
update preload/docs/tests to match the new plugin-owned speech architecture

Behavioral intent:

both plugins are built-in but not always_enabled
users can now hot-switch TTS and STT independently
browser TTS remains available when _kokoro_tts is off
Whisper mic UI only appears when _whisper_stt is enabled

@extensible

…gs (local PR-B+C shape) Adds resolve_mcp_server_headers async extension point at both MCP transport paths in mcp_handler.py (streamablehttp + sse). Enables plugins to resolve credential placeholders at header construction time without monkey-patching. Adds @extensible to set_settings() and set_settings_delta() in settings.py. Enables plugins to intercept settings writes for credential scanning. Local patch shape for upstream PR-B+C submission. Ref: deimos_openbao_secrets IMPLEMENTATION_PLAN.md Step 1

Adds sidebar-chat-item-start and sidebar-chat-item-end x-extension points inside the x-for loop in chats-list.html. Previously only sidebar-chats-list-start/end existed, both outside the x-for loop. This forced plugins that need per-chat-row UI (e.g. status indicators, labels, badges) to resort to MutationObserver + index-based DOM scanning and monkey-patching internal store methods. With these new extension points, plugins can inject content into each chat row with access to the reactive Alpine context object (context.id, context.name, context.running, context.project, etc.) entirely through declarative Alpine bindings — no DOM scanning, no method patching, no index arithmetic.

Split the legacy core speech stack into two built-in, independently toggleable plugins: `_kokoro_tts` for TTS and `_whisper_stt` for STT. This refactor keeps dependency installation and bootstrap concerns in Docker/bootstrap/preload, while moving speech-specific tooling, APIs, prompts, UI, and runtime behavior into the plugins. Core now exposes engine-agnostic `tts-service` and `stt-service` brokers, with browser-native TTS preserved as the fallback when Kokoro is disabled. Included in this change: - add built-in `_kokoro_tts` plugin with plugin-owned synth API, config, status UI, and provider registration - add built-in `_whisper_stt` plugin with plugin-owned transcribe API, mic runtime, device UI, prompt injection, and provider registration - remove legacy core speech APIs/helpers/settings/UI and delete unused `webui/js/speech_browser.js` - replace the old hardcoded speech settings section with a generic voice surface backed by plugin extensions - update preload/docs/tests to match the new plugin-owned speech architecture Behavioral intent: - both plugins are built-in but not `always_enabled` - users can now hot-switch TTS and STT independently - browser TTS remains available when `_kokoro_tts` is off - Whisper mic UI only appears when `_whisper_stt` is enabled

3clyp50 force-pushed the tts_stt branch 3 times, most recently from 7bd9eb6 to 19c8c60 Compare April 2, 2026 14:27

3clyp50 changed the base branch from development to ready April 2, 2026 14:27

Deimos-Agent and others added 3 commits April 2, 2026 17:46

3clyp50 force-pushed the tts_stt branch from 19c8c60 to 75ddd84 Compare April 2, 2026 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins#1371

Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins#1371
3clyp50 wants to merge 3 commits intoagent0ai:readyfrom
3clyp50:tts_stt

3clyp50 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

3clyp50 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants