-
Notifications
You must be signed in to change notification settings - Fork 247
docs: Clarify diarization pipeline version differences #511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -340,7 +340,7 @@ Notes: | |||||||
|
|
||||||||
| ### WeSpeaker/Pyannote Streaming | ||||||||
|
|
||||||||
| Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks. | ||||||||
| Pyannote 3.1 pipeline for online/streaming use. Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 GettingStarted.md incorrectly labels streaming section as "Pyannote 3.1 pipeline" The WeSpeaker/Pyannote Streaming section now says "Pyannote 3.1 pipeline for online/streaming use" but the online pipeline uses community-1, as established by PR #510 and confirmed by
Suggested change
Was this helpful? React with 👍 or 👎 to provide feedback. |
||||||||
|
|
||||||||
| Process audio in chunks for real-time applications: | ||||||||
|
|
||||||||
|
|
||||||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -43,7 +43,7 @@ TDT models process audio in chunks (~15s with overlap) as batch operations. | |||||||
| |-------|-------------|---------| | ||||||||
| | **LS-EEND** | Research prototype end-to-end streaming diarization model from Westlake University. Supports both streaming and complete-buffer inference for up to 10 speakers. Uses frame-in, frame-out processing, requiring 900ms of warmup audio and 100ms per update. | Added after Sortformer to support largers speaker counts. | | ||||||||
| | **Sortformer** | NVIDIA's enterprise-grade end-to-end streaming diarization model. Supports both streaming and complete-buffer inference for up to 4 speakers. More stable than LS-EEND, but sometimes misses speech. Processes audio in chunks, requiring 1040ms of warmup audio and 480ms per update for the low latency versions. | Added after Pyannote to support low-latency streaming diarization. | | ||||||||
| | **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Best offline diarization pipeline, but also support online use | First diarizer model added. Converted from Pyannote with custom made batching mode | | ||||||||
| | **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-3.1. Offline batch pipeline (OfflineDiarizerManager) based on pyannote/speaker-diarization-community-1. | First diarizer model added. Converted from Pyannote with custom made batching mode | | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Models.md incorrectly claims online pipeline is based on pyannote 3.1 The Pyannote CoreML Pipeline description now says "Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-3.1" but the actual model is based on community-1 as established by PR #510 (
Suggested change
Was this helpful? React with 👍 or 👎 to provide feedback. |
||||||||
|
|
||||||||
| ## TTS Models | ||||||||
|
|
||||||||
|
|
||||||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -373,7 +373,7 @@ Both LS-EEND and Sortformer emit results into a `DiarizerTimeline` with ultra-lo | |||||||
|
|
||||||||
| ### Streaming/Online Speaker Diarization (Pyannote) | ||||||||
|
|
||||||||
| This pipeline uses segmentation plus speaker embeddings and is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization. | ||||||||
| Pyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization. This is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 README.md incorrectly labels online diarization as "Pyannote 3.1 pipeline" The README now says "Pyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization" but both pipelines use community-1 as established by PR #510 and confirmed by
Suggested change
Was this helpful? React with 👍 or 👎 to provide feedback. |
||||||||
|
|
||||||||
| Why use the WeSpeaker/Pyannote pipeline: | ||||||||
| - More modular pipeline if you want separate segmentation and embedding stages | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 CLAUDE.md incorrectly claims online DiarizerManager is based on pyannote 3.1
This PR re-introduces the incorrect claim that the online/streaming
DiarizerManageris based onpyannote/speaker-diarization-3.1. The immediately preceding PR #510 (commit421313a) explicitly corrected this, stating: "The actual CoreML model at FluidInference/speaker-diarization-coreml has always been based on community-1, but some documentation incorrectly referenced 3.1." The code itself atSources/FluidAudio/Diarizer/Segmentation/SegmentationProcessor.swift:227says values come frompyannote/speaker-diarization-community-1, andDocumentation/Benchmarks.md:463(unchanged by this PR) states "Both offline and online versions use the community-1 model." This creates a factual inconsistency in the CLAUDE.md special rule file, which will mislead AI assistants and developers about which model the online pipeline actually uses.Was this helpful? React with 👍 or 👎 to provide feedback.