diff --git a/CLAUDE.md b/CLAUDE.md index 4de278c47..1c71a10a9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -180,7 +180,9 @@ GitHub Actions workflows: ## Model Sources -- **Diarization**: [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1) +- **Diarization**: + - Online/Streaming (DiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-3.1) + - Offline Batch (OfflineDiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1) - **VAD CoreML**: [FluidInference/silero-vad-coreml](https://huggingface.co/FluidInference/silero-vad-coreml) - **ASR Models**: [FluidInference/parakeet-tdt-0.6b-v3-coreml](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml) - **Test Data**: [alexwengg/musan_mini*](https://huggingface.co/datasets/alexwengg) variants diff --git a/Documentation/Diarization/GettingStarted.md b/Documentation/Diarization/GettingStarted.md index 0ab0d0cda..b67d3a5a7 100644 --- a/Documentation/Diarization/GettingStarted.md +++ b/Documentation/Diarization/GettingStarted.md @@ -340,7 +340,7 @@ Notes: ### WeSpeaker/Pyannote Streaming -Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks. +Pyannote 3.1 pipeline for online/streaming use. Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks. Process audio in chunks for real-time applications: diff --git a/Documentation/Models.md b/Documentation/Models.md index 75eb541b8..f87f95ad3 100644 --- a/Documentation/Models.md +++ b/Documentation/Models.md @@ -43,7 +43,7 @@ TDT models process audio in chunks (~15s with overlap) as batch operations. |-------|-------------|---------| | **LS-EEND** | Research prototype end-to-end streaming diarization model from Westlake University. Supports both streaming and complete-buffer inference for up to 10 speakers. Uses frame-in, frame-out processing, requiring 900ms of warmup audio and 100ms per update. | Added after Sortformer to support largers speaker counts. | | **Sortformer** | NVIDIA's enterprise-grade end-to-end streaming diarization model. Supports both streaming and complete-buffer inference for up to 4 speakers. More stable than LS-EEND, but sometimes misses speech. Processes audio in chunks, requiring 1040ms of warmup audio and 480ms per update for the low latency versions. | Added after Pyannote to support low-latency streaming diarization. | -| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Best offline diarization pipeline, but also support online use | First diarizer model added. Converted from Pyannote with custom made batching mode | +| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-3.1. Offline batch pipeline (OfflineDiarizerManager) based on pyannote/speaker-diarization-community-1. | First diarizer model added. Converted from Pyannote with custom made batching mode | ## TTS Models diff --git a/README.md b/README.md index 80abff28f..25826c1e1 100644 --- a/README.md +++ b/README.md @@ -373,7 +373,7 @@ Both LS-EEND and Sortformer emit results into a `DiarizerTimeline` with ultra-lo ### Streaming/Online Speaker Diarization (Pyannote) -This pipeline uses segmentation plus speaker embeddings and is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization. +Pyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization. This is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization. Why use the WeSpeaker/Pyannote pipeline: - More modular pipeline if you want separate segmentation and embedding stages