From 8d517471ed05e227c723befccfa7adaec39771f0 Mon Sep 17 00:00:00 2001 From: Alex-Wengg Date: Fri, 10 Apr 2026 22:41:54 -0400 Subject: [PATCH 1/2] docs: Fix speaker diarization model references from 3.1 to community-1 - Update code comment in SegmentationProcessor.swift - Update CLAUDE.md model source reference - Update Documentation/Benchmarks.md to clarify both online/offline use community-1 Co-Authored-By: Claude Sonnet 4.5 --- CLAUDE.md | 2 +- Documentation/Benchmarks.md | 2 +- .../Diarizer/Segmentation/SegmentationProcessor.swift | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index aba8bc55a..4de278c47 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -180,7 +180,7 @@ GitHub Actions workflows: ## Model Sources -- **Diarization**: [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) +- **Diarization**: [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1) - **VAD CoreML**: [FluidInference/silero-vad-coreml](https://huggingface.co/FluidInference/silero-vad-coreml) - **ASR Models**: [FluidInference/parakeet-tdt-0.6b-v3-coreml](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml) - **Test Data**: [alexwengg/musan_mini*](https://huggingface.co/datasets/alexwengg) variants diff --git a/Documentation/Benchmarks.md b/Documentation/Benchmarks.md index b6d06aebd..06441525c 100644 --- a/Documentation/Benchmarks.md +++ b/Documentation/Benchmarks.md @@ -460,7 +460,7 @@ swift run -c release fluidaudiocli nemotron-benchmark --chunk 560 ## Speaker Diarization -The offline version uses the community-1 model, the online version uses the legacy speaker-diarization-3.1 model. +Both offline and online versions use the community-1 model (via FluidInference/speaker-diarization-coreml). ### Offline diarization pipeline diff --git a/Sources/FluidAudio/Diarizer/Segmentation/SegmentationProcessor.swift b/Sources/FluidAudio/Diarizer/Segmentation/SegmentationProcessor.swift index 348c3eb28..4a6909a94 100644 --- a/Sources/FluidAudio/Diarizer/Segmentation/SegmentationProcessor.swift +++ b/Sources/FluidAudio/Diarizer/Segmentation/SegmentationProcessor.swift @@ -224,7 +224,7 @@ public struct SegmentationProcessor { func createSlidingWindowFeature( binarizedSegments: [[[Float]]], chunkOffset: Double = 0.0 ) -> SlidingWindowFeature { - // These values come from the pyannote/speaker-diarization-3.1 model configuration + // These values come from the pyannote/speaker-diarization-community-1 model configuration let slidingWindow = SlidingWindow( start: chunkOffset, duration: 0.0619375, // 991 samples at 16kHz (model's sliding window duration) From dda36e869b0515a61b5562ea0e954b776276d76a Mon Sep 17 00:00:00 2001 From: Alex-Wengg Date: Sat, 11 Apr 2026 11:12:34 -0400 Subject: [PATCH 2/2] docs: Clarify diarization pipeline version differences Distinguish between online and offline diarization pipelines: - Online/streaming (DiarizerManager): Pyannote 3.1 - Offline batch (OfflineDiarizerManager): Pyannote Community-1 Updated documentation in: - CLAUDE.md Model Sources section - README.md Streaming/Online Speaker Diarization section - Documentation/Models.md Diarization Models table - Documentation/Diarization/GettingStarted.md WeSpeaker/Pyannote Streaming section Addresses feedback from PR #6 review comment: https://github.com/FluidInference/docs.fluidinference.com/pull/6#discussion_r3068126335 --- CLAUDE.md | 4 +++- Documentation/Diarization/GettingStarted.md | 2 +- Documentation/Models.md | 2 +- README.md | 2 +- 4 files changed, 6 insertions(+), 4 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 4de278c47..1c71a10a9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -180,7 +180,9 @@ GitHub Actions workflows: ## Model Sources -- **Diarization**: [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1) +- **Diarization**: + - Online/Streaming (DiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-3.1) + - Offline Batch (OfflineDiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1) - **VAD CoreML**: [FluidInference/silero-vad-coreml](https://huggingface.co/FluidInference/silero-vad-coreml) - **ASR Models**: [FluidInference/parakeet-tdt-0.6b-v3-coreml](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml) - **Test Data**: [alexwengg/musan_mini*](https://huggingface.co/datasets/alexwengg) variants diff --git a/Documentation/Diarization/GettingStarted.md b/Documentation/Diarization/GettingStarted.md index 0ab0d0cda..b67d3a5a7 100644 --- a/Documentation/Diarization/GettingStarted.md +++ b/Documentation/Diarization/GettingStarted.md @@ -340,7 +340,7 @@ Notes: ### WeSpeaker/Pyannote Streaming -Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks. +Pyannote 3.1 pipeline for online/streaming use. Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks. Process audio in chunks for real-time applications: diff --git a/Documentation/Models.md b/Documentation/Models.md index 75eb541b8..f87f95ad3 100644 --- a/Documentation/Models.md +++ b/Documentation/Models.md @@ -43,7 +43,7 @@ TDT models process audio in chunks (~15s with overlap) as batch operations. |-------|-------------|---------| | **LS-EEND** | Research prototype end-to-end streaming diarization model from Westlake University. Supports both streaming and complete-buffer inference for up to 10 speakers. Uses frame-in, frame-out processing, requiring 900ms of warmup audio and 100ms per update. | Added after Sortformer to support largers speaker counts. | | **Sortformer** | NVIDIA's enterprise-grade end-to-end streaming diarization model. Supports both streaming and complete-buffer inference for up to 4 speakers. More stable than LS-EEND, but sometimes misses speech. Processes audio in chunks, requiring 1040ms of warmup audio and 480ms per update for the low latency versions. | Added after Pyannote to support low-latency streaming diarization. | -| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Best offline diarization pipeline, but also support online use | First diarizer model added. Converted from Pyannote with custom made batching mode | +| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-3.1. Offline batch pipeline (OfflineDiarizerManager) based on pyannote/speaker-diarization-community-1. | First diarizer model added. Converted from Pyannote with custom made batching mode | ## TTS Models diff --git a/README.md b/README.md index 280d1d709..5fa0bd775 100644 --- a/README.md +++ b/README.md @@ -372,7 +372,7 @@ Both LS-EEND and Sortformer emit results into a `DiarizerTimeline` with ultra-lo ### Streaming/Online Speaker Diarization (Pyannote) -This pipeline uses segmentation plus speaker embeddings and is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization. +Pyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization. This is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization. Why use the WeSpeaker/Pyannote pipeline: - More modular pipeline if you want separate segmentation and embedding stages