Skip to content

Feature Proposal: Add FunASR as Self-Hosted STT Connector #14067

@LauraGPT

Description

@LauraGPT

Semantic Kernel enables AI orchestration across multiple models and services. Adding speech-to-text as a native skill would enable voice-enabled AI agents and applications. FunASR (17.8K+ stars, https://github.com/modelscope/FunASR) provides:

  • SenseVoice: Ultra-fast multilingual ASR (50x faster than Whisper-large)
  • Paraformer: Production-grade ASR with timestamps and punctuation
  • Fun-ASR-Nano: Lightweight streaming ASR for edge deployment
  • OpenAI-compatible API: POST /v1/audio/transcriptions — drop-in Whisper API replacement

Since FunASR exposes an OpenAI-compatible endpoint, it can serve as a self-hosted STT backend in Semantic Kernel. Developers would configure a local FunASR server URL as their audio transcription endpoint, enabling fully self-hosted voice-to-text-to-response AI pipelines without external API dependencies.

This aligns with Semantic Kernels goal of flexible AI orchestration — FunASR adds another modality (audio) that can be combined with existing text generation skills.

Would adding FunASR as an STT connector be useful for Semantic Kernel users?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions