Releases · modelscope/FunASR

19 Jun 14:01

LauraGPT

v1.3.10

6960178

v1.3.10 Latest

Latest

FunASR v1.3.10

New features

Agent-friendly CLI: funasr audio.wav --output-format json for structured output
Fun-ASR-Nano: batched VAD-segment decoding (~1.75× faster) (#2979)
WebSocket 2-pass server: sentence-level timestamps
serve_vllm.py: new --vad-model / --spk-model flags

Fixes

Fun-ASR-Nano: bf16/fp16 inference no longer crashes; warn on degraded fp16 (#2980)
Fun-ASR-Nano vLLM: fix CUDA crash from repetition_penalty
CLI: valid SRT timestamps + correct JSON durations (#2982); use sentence_info text (#2983); correct model id Fun-ASR-Nano-2512 (#2984)
Clearer error for missing audio path (#2981); respect explicit VAD silence threshold; handle None encoder/scheduler configs

Docs

New CLI reference; clearer vLLM install guidance

Full changelog: v1.3.9...v1.3.10

Assets 2

29 May 20:53

LauraGPT

v1.3.9

11b04b8

v1.3.9: Wheel packaging + SenseVoice speaker diarization fix

What's New

Wheel packaging (fixes #2943)

FunASR now publishes a py3-none-any wheel alongside the source distribution. Installation is faster since pip no longer needs to build from source.

Bug fixes

SenseVoice + speaker diarization: Fixed crash when using spk_model="cam++" with SenseVoice (auto-falls back to VAD-segment mode since SenseVoice doesn't produce word-level timestamps)
torchaudio >= 2.11 compatibility: Added soundfile as intermediate fallback for users with newer torchaudio versions that removed legacy backends

Install / Upgrade

pip install --upgrade funasr

Full changelog: v1.3.3...v1.3.9

Assets 2

23 May 20:35

LauraGPT

v1.3.3

6be9616

v1.3.3: Agent Integration — OpenAI API + MCP Server + funasr-server CLI

Highlights

This release makes FunASR a drop-in speech backend for AI agents.

New: `funasr-server` CLI

pip install funasr fastapi uvicorn python-multipart
funasr-server --device cuda

One command starts an OpenAI-compatible /v1/audio/transcriptions endpoint.

New: MCP Server

AI assistants (Claude, Cursor, Windsurf) can now transcribe audio directly.

New: OpenAI-Compatible API

Works with any agent framework: LangChain, AutoGen, CrewAI, Dify, Flowise, Open WebUI.

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
result = client.audio.transcriptions.create(model="sensevoice", file=open("a.wav","rb"))

Bug Fixes

Fixed hub="hf" parameter propagation to sub-models (v1.3.2)
Fixed Qwen3-ASR ImportError masking

Upgrade

pip install --upgrade funasr

Links

Assets 2

23 May 16:56

LauraGPT

v1.3.2

3289169

v1.3.2: HuggingFace Hub Fix + Performance Benchmark

What's New

Bug Fix

Fixed hub parameter propagation — When using hub="hf", the parameter is now correctly forwarded to VAD/PUNC/SPK sub-models. Previously, users on HuggingFace would get 404 errors for sub-models. (#2859)

Improvements

Updated PyPI metadata with better description, keywords, and project URLs
Added comprehensive benchmark page: https://modelscope.github.io/FunASR/benchmark.html

Benchmark Results (PyTorch, GPU)

Model	Type	Speed
SenseVoice-Small	NAR	170x realtime
Paraformer-Large	NAR	120x realtime
Whisper-large-v3-turbo	AR	46x realtime
Fun-ASR-Nano	LLM	17x realtime
Whisper-large-v3	AR	13.4x realtime

Install / Upgrade

pip install --upgrade funasr

Quick Start

from funasr import AutoModel
model = AutoModel(model="FunAudioLLM/SenseVoiceSmall", hub="hf", vad_model="funasr/fsmn-vad", device="cuda")
result = model.generate(input="audio.wav")

Assets 2

16 Mar 08:15

LauraGPT

v0.3.0

64bd637

0.3.0

What's new:

2023.3.17, funasr-0.3.0, modelscope-1.4.1

New Features:
- Added support for GPU runtime solution, nv-triton, which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
- Added support for CPU runtime quantization solution, which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted benchmark tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
- Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, demo.
- Added streaming inference pipeline to the 16k VAD model, 8k VAD model, with support for audio input streams (>= 10ms) , demo.
- Improved the punctuation prediction model, resulting in increased accuracy (F-score increased from 55.6 to 56.5).
- Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. Paraformer streaming model is used to output text in real time, while Paraformer-large offline model is used to correct recognition results, demo.
New Models:
- Added 16k Paraformer streaming model, which supports real-time speech recognition with streaming audio input, demo. It can be deployed using the gRPC service to implement real-time subtitle function.
- Added streaming punctuation model, which supports real-time punctuation marking in streaming speech recognition scenarios, with real-time calls based on VAD points. It can be used along with real-time ASR models to achieve readable real-time subtitle function, demo.
- Added TP-Aligner timestamp model, which takes audio and corresponding text as input and outputs word-level timestamps. Its performance is comparable to that of the Kaldi FA model (60.3ms vs. 69.3ms). It can be combined freely with ASR models, demo.
- Added financial domain model (8k Paraformer-large-3445vocab), which is fine-tuned using 1000 hours of data. The recognition accuracy on the financial domain test set increased by 5%, and the recall rate of domain keywords increased by 7%.
- Added audio-visual domain model (16k Paraformer-large-3445vocab), which is fine-tuned using 10,000 hours of data. The recognition accuracy on the audio-visual domain test set increased by 8%.
- Added 8k speaker verification model, which can be used for speaker embedding extraction.
- Added speaker diarization models, including 16k SOND Chinese model, 8k SOND English model, which achieved the best performance on AliMeeting and Callhome with a DER of 4.46% and 11.13%, respectively.
- Added UniASR streaming offline unifying models, including 16k UniASR Burmese, 16k UniASR Hebrew, 16k UniASR Urdu, 8k UniASR Mandarin financial domain, and 16k UniASR Mandarin audio-visual domain.

New Contributors

@dingbig made their first contribution in #147
@yuekaizhang made their first contribution in #161
@zhuz...

Contributors

dingbig, znsoftm, and 3 other contributors

Assets 2

20 Feb 02:22

LauraGPT

v0.2.0

0d15538

v0.2.0

What's new:

2023.2.17, funasr-0.2.0, modelscope-1.3.0

We support a new feature, export paraformer models into onnx and torchscripts from modelscope. The local finetuned models are also supported.
We support a new feature, onnxruntime, you could deploy the runtime without modelscope or funasr, for the paraformer-large model, the rtf of onnxruntime is 3x speedup(0.110->0.038) on cpu, details.
We support a new feature, grpc, you could build the ASR service with grpc, by deploying the modelscope pipeline or onnxruntime.
We release a new model paraformer-large-contextual, which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
We optimize the timestamp alignment of Paraformer-large-long, the prediction accuracy of timestamp is much improved, and achieving accumulated average shift (aas) of 74.7ms, details.
We release a new model, 8k VAD model, which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in modelscope.
We release a new model, MFCCA, a multi-channel multi-speaker model which is independent of the number and geometry of microphones and supports Mandarin meeting transcription.
We release several new UniASR model: Southern Fujian Dialect model, French model, German model, Vietnamese model, Persian model.
We release a new model, paraformer-data2vec model, an unsupervised pretraining model on AISHELL-2, which is inited for paraformer model and then finetune on AISHEL-1.
We release a new feature, the VAD, ASR and PUNC models could be integrated freely, which could be models from modelscope, or the local finetine models. The demo.
We optimize punctuation common model, enhance the recall and precision, fix the badcases of missing punctuation marks.
Various new types of audio input types are now supported by modelscope inference pipeline, including: mp3、flac、ogg、opus...

New Contributors

@zjc6666 made their first contribution in #35
@lyblsgo made their first contribution in #37
@lingyunfly made their first contribution in #42
@fangd123 made their first contribution in #44
@dyyzhmm made their first contribution in #48
@R1ckShi made their first contribution in #50
@chenmengzheAAA made their first contribution in #57
@ZhihaoDU made their first contribution in #95
@SWHL made their first contribution in #97
@yufan-aslp made their first contribution in #105
@magicharry made their first contribution in #119

Full Changelog: v0.1.6...v0.2.0

Contributors

fangd123, lyblsgo, and 9 other contributors

Assets 2

16 Jan 11:28

LauraGPT

v0.1.6

5014a39

v0.1.6

Release Notes:

2023.1.16, funasr-0.1.6

We release a new version model Paraformer-large-long, which integrate the VAD model, ASR,
Punctuation model and timestamp together. The model could take in several hours long inputs.
We release a new type model, VAD, which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in Model Zoo.
We release a new type model, Punctuation, which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in Model Zoo.
We release a new model, Data2vec, an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
We release a new model, Paraformer-Tiny, a lightweight Paraformer model which supports Mandarin command words recognition.
We release a new type model, SV, which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
Various new types of audio input types are now supported by modelscope inference pipeline, including wav.scp, wav format, audio bytes, wave samples...

New Contributors

@nichongjia-2007 made their first contribution in #27

Full Changelog: v0.1.4...v0.1.6

Contributors

nichongjia-2007

Assets 2

10 Dec 04:54

LauraGPT

v0.1.4

f9fed09

v0.1.4

The is the first release version.

Paraformer model could be decoding with batch >1.
UniASR model and recipes are new added.
Transformer and Conformer are also contained.
The inference and finetuning of models in modelscope are more convenience.

Assets 2

Releases: modelscope/FunASR

v1.3.10

FunASR v1.3.10

New features

Fixes

Docs

Uh oh!

v1.3.9: Wheel packaging + SenseVoice speaker diarization fix

What's New

Wheel packaging (fixes #2943)

Bug fixes

Install / Upgrade

Uh oh!

v1.3.3: Agent Integration — OpenAI API + MCP Server + funasr-server CLI

Highlights

New: funasr-server CLI

New: MCP Server

New: OpenAI-Compatible API

Bug Fixes

Upgrade

Links

Uh oh!

v1.3.2: HuggingFace Hub Fix + Performance Benchmark

What's New

Bug Fix

Improvements

Benchmark Results (PyTorch, GPU)

Install / Upgrade

Quick Start

Uh oh!

0.3.0

What's new:

2023.3.17, funasr-0.3.0, modelscope-1.4.1

最新更新：

New Contributors

Contributors

Uh oh!

v0.2.0

What's new:

2023.2.17, funasr-0.2.0, modelscope-1.3.0

最新更新：

New Contributors

Contributors

Uh oh!

v0.1.6

Release Notes:

2023.1.16, funasr-0.1.6

最新更新

New Contributors

Contributors

Uh oh!

v0.1.4

Uh oh!

New: `funasr-server` CLI