Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins#1371
Open
3clyp50 wants to merge 3 commits intoagent0ai:readyfrom
Open
Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins#13713clyp50 wants to merge 3 commits intoagent0ai:readyfrom
3clyp50 wants to merge 3 commits intoagent0ai:readyfrom
Conversation
7bd9eb6 to
19c8c60
Compare
…gs (local PR-B+C shape) Adds resolve_mcp_server_headers async extension point at both MCP transport paths in mcp_handler.py (streamablehttp + sse). Enables plugins to resolve credential placeholders at header construction time without monkey-patching. Adds @extensible to set_settings() and set_settings_delta() in settings.py. Enables plugins to intercept settings writes for credential scanning. Local patch shape for upstream PR-B+C submission. Ref: deimos_openbao_secrets IMPLEMENTATION_PLAN.md Step 1
Adds sidebar-chat-item-start and sidebar-chat-item-end x-extension points inside the x-for loop in chats-list.html. Previously only sidebar-chats-list-start/end existed, both outside the x-for loop. This forced plugins that need per-chat-row UI (e.g. status indicators, labels, badges) to resort to MutationObserver + index-based DOM scanning and monkey-patching internal store methods. With these new extension points, plugins can inject content into each chat row with access to the reactive Alpine context object (context.id, context.name, context.running, context.project, etc.) entirely through declarative Alpine bindings — no DOM scanning, no method patching, no index arithmetic.
Split the legacy core speech stack into two built-in, independently toggleable plugins: `_kokoro_tts` for TTS and `_whisper_stt` for STT. This refactor keeps dependency installation and bootstrap concerns in Docker/bootstrap/preload, while moving speech-specific tooling, APIs, prompts, UI, and runtime behavior into the plugins. Core now exposes engine-agnostic `tts-service` and `stt-service` brokers, with browser-native TTS preserved as the fallback when Kokoro is disabled. Included in this change: - add built-in `_kokoro_tts` plugin with plugin-owned synth API, config, status UI, and provider registration - add built-in `_whisper_stt` plugin with plugin-owned transcribe API, mic runtime, device UI, prompt injection, and provider registration - remove legacy core speech APIs/helpers/settings/UI and delete unused `webui/js/speech_browser.js` - replace the old hardcoded speech settings section with a generic voice surface backed by plugin extensions - update preload/docs/tests to match the new plugin-owned speech architecture Behavioral intent: - both plugins are built-in but not `always_enabled` - users can now hot-switch TTS and STT independently - browser TTS remains available when `_kokoro_tts` is off - Whisper mic UI only appears when `_whisper_stt` is enabled
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Split the legacy core speech stack into two built-in, independently toggleable plugins:
_kokoro_ttsfor TTS and_whisper_sttfor STT.This extraction keeps dependency installation and bootstrap concerns in Docker/bootstrap/preload, while moving speech-specific tooling, APIs, prompts, UI, and runtime behavior into the plugins. Core now exposes engine-agnostic
tts-serviceandstt-servicebrokers, with browser-native TTS preserved as the fallback when Kokoro is disabled.Included in this change:
_kokoro_ttsplugin with plugin-owned synth API, config, status UI, and provider registration_whisper_sttplugin with plugin-owned transcribe API, mic runtime, device UI, prompt injection, and provider registrationwebui/js/speech_browser.jsBehavioral intent:
always_enabled_kokoro_ttsis off_whisper_sttis enabled