Feature Proposal: Add FunASR as Self-Hosted STT Connector

Semantic Kernel enables AI orchestration across multiple models and services. Adding speech-to-text as a native skill would enable voice-enabled AI agents and applications. FunASR (17.8K+ stars, https://github.com/modelscope/FunASR) provides:

- **SenseVoice**: Ultra-fast multilingual ASR (50x faster than Whisper-large)
- **Paraformer**: Production-grade ASR with timestamps and punctuation
- **Fun-ASR-Nano**: Lightweight streaming ASR for edge deployment
- **OpenAI-compatible API**: POST /v1/audio/transcriptions — drop-in Whisper API replacement

Since FunASR exposes an OpenAI-compatible endpoint, it can serve as a self-hosted STT backend in Semantic Kernel. Developers would configure a local FunASR server URL as their audio transcription endpoint, enabling fully self-hosted voice-to-text-to-response AI pipelines without external API dependencies.

This aligns with Semantic Kernels goal of flexible AI orchestration — FunASR adds another modality (audio) that can be combined with existing text generation skills.

Would adding FunASR as an STT connector be useful for Semantic Kernel users?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: Add FunASR as Self-Hosted STT Connector #14067

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Proposal: Add FunASR as Self-Hosted STT Connector #14067

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions