Skip to content

Releases: modelscope/FunASR

v1.3.10

19 Jun 14:01

Choose a tag to compare

FunASR v1.3.10

New features

  • Agent-friendly CLI: funasr audio.wav --output-format json for structured output
  • Fun-ASR-Nano: batched VAD-segment decoding (~1.75× faster) (#2979)
  • WebSocket 2-pass server: sentence-level timestamps
  • serve_vllm.py: new --vad-model / --spk-model flags

Fixes

  • Fun-ASR-Nano: bf16/fp16 inference no longer crashes; warn on degraded fp16 (#2980)
  • Fun-ASR-Nano vLLM: fix CUDA crash from repetition_penalty
  • CLI: valid SRT timestamps + correct JSON durations (#2982); use sentence_info text (#2983); correct model id Fun-ASR-Nano-2512 (#2984)
  • Clearer error for missing audio path (#2981); respect explicit VAD silence threshold; handle None encoder/scheduler configs

Docs

  • New CLI reference; clearer vLLM install guidance

Full changelog: v1.3.9...v1.3.10

v1.3.9: Wheel packaging + SenseVoice speaker diarization fix

29 May 20:53

Choose a tag to compare

What's New

Wheel packaging (fixes #2943)

FunASR now publishes a py3-none-any wheel alongside the source distribution. Installation is faster since pip no longer needs to build from source.

Bug fixes

  • SenseVoice + speaker diarization: Fixed crash when using spk_model="cam++" with SenseVoice (auto-falls back to VAD-segment mode since SenseVoice doesn't produce word-level timestamps)
  • torchaudio >= 2.11 compatibility: Added soundfile as intermediate fallback for users with newer torchaudio versions that removed legacy backends

Install / Upgrade

pip install --upgrade funasr

Full changelog: v1.3.3...v1.3.9

v1.3.3: Agent Integration — OpenAI API + MCP Server + funasr-server CLI

23 May 20:35

Choose a tag to compare

Highlights

This release makes FunASR a drop-in speech backend for AI agents.

New: funasr-server CLI

pip install funasr fastapi uvicorn python-multipart
funasr-server --device cuda

One command starts an OpenAI-compatible /v1/audio/transcriptions endpoint.

New: MCP Server

AI assistants (Claude, Cursor, Windsurf) can now transcribe audio directly.

New: OpenAI-Compatible API

Works with any agent framework: LangChain, AutoGen, CrewAI, Dify, Flowise, Open WebUI.

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
result = client.audio.transcriptions.create(model="sensevoice", file=open("a.wav","rb"))

Bug Fixes

  • Fixed hub="hf" parameter propagation to sub-models (v1.3.2)
  • Fixed Qwen3-ASR ImportError masking

Upgrade

pip install --upgrade funasr

Links

v1.3.2: HuggingFace Hub Fix + Performance Benchmark

23 May 16:56

Choose a tag to compare

What's New

Bug Fix

  • Fixed hub parameter propagation — When using hub="hf", the parameter is now correctly forwarded to VAD/PUNC/SPK sub-models. Previously, users on HuggingFace would get 404 errors for sub-models. (#2859)

Improvements

Benchmark Results (PyTorch, GPU)

Model Type Speed
SenseVoice-Small NAR 170x realtime
Paraformer-Large NAR 120x realtime
Whisper-large-v3-turbo AR 46x realtime
Fun-ASR-Nano LLM 17x realtime
Whisper-large-v3 AR 13.4x realtime

Install / Upgrade

pip install --upgrade funasr

Quick Start

from funasr import AutoModel
model = AutoModel(model="FunAudioLLM/SenseVoiceSmall", hub="hf", vad_model="funasr/fsmn-vad", device="cuda")
result = model.generate(input="audio.wav")

0.3.0

16 Mar 08:15

Choose a tag to compare

What's new:

2023.3.17, funasr-0.3.0, modelscope-1.4.1

  • New Features:
    • Added support for GPU runtime solution, nv-triton, which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
    • Added support for CPU runtime quantization solution, which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted benchmark tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
    • Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, demo.
    • Added streaming inference pipeline to the 16k VAD model, 8k VAD model, with support for audio input streams (>= 10ms) , demo.
    • Improved the punctuation prediction model, resulting in increased accuracy (F-score increased from 55.6 to 56.5).
    • Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. Paraformer streaming model is used to output text in real time, while Paraformer-large offline model is used to correct recognition results, demo.
  • New Models:

最新更新:

New Contributors

Read more

v0.2.0

20 Feb 02:22
0d15538

Choose a tag to compare

What's new:

2023.2.17, funasr-0.2.0, modelscope-1.3.0

  • We support a new feature, export paraformer models into onnx and torchscripts from modelscope. The local finetuned models are also supported.
  • We support a new feature, onnxruntime, you could deploy the runtime without modelscope or funasr, for the paraformer-large model, the rtf of onnxruntime is 3x speedup(0.110->0.038) on cpu, details.
  • We support a new feature, grpc, you could build the ASR service with grpc, by deploying the modelscope pipeline or onnxruntime.
  • We release a new model paraformer-large-contextual, which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
  • We optimize the timestamp alignment of Paraformer-large-long, the prediction accuracy of timestamp is much improved, and achieving accumulated average shift (aas) of 74.7ms, details.
  • We release a new model, 8k VAD model, which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in modelscope.
  • We release a new model, MFCCA, a multi-channel multi-speaker model which is independent of the number and geometry of microphones and supports Mandarin meeting transcription.
  • We release several new UniASR model: Southern Fujian Dialect model, French model, German model, Vietnamese model, Persian model.
  • We release a new model, paraformer-data2vec model, an unsupervised pretraining model on AISHELL-2, which is inited for paraformer model and then finetune on AISHEL-1.
  • We release a new feature, the VAD, ASR and PUNC models could be integrated freely, which could be models from modelscope, or the local finetine models. The demo.
  • We optimize punctuation common model, enhance the recall and precision, fix the badcases of missing punctuation marks.
  • Various new types of audio input types are now supported by modelscope inference pipeline, including: mp3、flac、ogg、opus...

最新更新:

New Contributors

Full Changelog: v0.1.6...v0.2.0

v0.1.6

16 Jan 11:28
5014a39

Choose a tag to compare

Release Notes:

2023.1.16, funasr-0.1.6

  • We release a new version model Paraformer-large-long, which integrate the VAD model, ASR,
    Punctuation model and timestamp together. The model could take in several hours long inputs.
  • We release a new type model, VAD, which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in Model Zoo.
  • We release a new type model, Punctuation, which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in Model Zoo.
  • We release a new model, Data2vec, an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
  • We release a new model, Paraformer-Tiny, a lightweight Paraformer model which supports Mandarin command words recognition.
  • We release a new type model, SV, which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
  • We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
  • Various new types of audio input types are now supported by modelscope inference pipeline, including wav.scp, wav format, audio bytes, wave samples...

最新更新

New Contributors

Full Changelog: v0.1.4...v0.1.6

v0.1.4

10 Dec 04:54
f9fed09

Choose a tag to compare

The is the first release version.

  1. Paraformer model could be decoding with batch >1.
  2. UniASR model and recipes are new added.
  3. Transformer and Conformer are also contained.
  4. The inference and finetuning of models in modelscope are more convenience.