Semantic Kernel enables AI orchestration across multiple models and services. Adding speech-to-text as a native skill would enable voice-enabled AI agents and applications. FunASR (17.8K+ stars, https://github.com/modelscope/FunASR) provides:
- SenseVoice: Ultra-fast multilingual ASR (50x faster than Whisper-large)
- Paraformer: Production-grade ASR with timestamps and punctuation
- Fun-ASR-Nano: Lightweight streaming ASR for edge deployment
- OpenAI-compatible API: POST /v1/audio/transcriptions — drop-in Whisper API replacement
Since FunASR exposes an OpenAI-compatible endpoint, it can serve as a self-hosted STT backend in Semantic Kernel. Developers would configure a local FunASR server URL as their audio transcription endpoint, enabling fully self-hosted voice-to-text-to-response AI pipelines without external API dependencies.
This aligns with Semantic Kernels goal of flexible AI orchestration — FunASR adds another modality (audio) that can be combined with existing text generation skills.
Would adding FunASR as an STT connector be useful for Semantic Kernel users?
Semantic Kernel enables AI orchestration across multiple models and services. Adding speech-to-text as a native skill would enable voice-enabled AI agents and applications. FunASR (17.8K+ stars, https://github.com/modelscope/FunASR) provides:
Since FunASR exposes an OpenAI-compatible endpoint, it can serve as a self-hosted STT backend in Semantic Kernel. Developers would configure a local FunASR server URL as their audio transcription endpoint, enabling fully self-hosted voice-to-text-to-response AI pipelines without external API dependencies.
This aligns with Semantic Kernels goal of flexible AI orchestration — FunASR adds another modality (audio) that can be combined with existing text generation skills.
Would adding FunASR as an STT connector be useful for Semantic Kernel users?