Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,9 @@ GitHub Actions workflows:

## Model Sources

- **Diarization**: [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1)
- **Diarization**:
- Online/Streaming (DiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-3.1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CLAUDE.md incorrectly claims online DiarizerManager is based on pyannote 3.1

This PR re-introduces the incorrect claim that the online/streaming DiarizerManager is based on pyannote/speaker-diarization-3.1. The immediately preceding PR #510 (commit 421313a) explicitly corrected this, stating: "The actual CoreML model at FluidInference/speaker-diarization-coreml has always been based on community-1, but some documentation incorrectly referenced 3.1." The code itself at Sources/FluidAudio/Diarizer/Segmentation/SegmentationProcessor.swift:227 says values come from pyannote/speaker-diarization-community-1, and Documentation/Benchmarks.md:463 (unchanged by this PR) states "Both offline and online versions use the community-1 model." This creates a factual inconsistency in the CLAUDE.md special rule file, which will mislead AI assistants and developers about which model the online pipeline actually uses.

Suggested change
- Online/Streaming (DiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-3.1)
- Online/Streaming (DiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- Offline Batch (OfflineDiarizerManager): [FluidInference/speaker-diarization-coreml](https://huggingface.co/FluidInference/speaker-diarization-coreml) (based on pyannote/speaker-diarization-community-1)
- **VAD CoreML**: [FluidInference/silero-vad-coreml](https://huggingface.co/FluidInference/silero-vad-coreml)
- **ASR Models**: [FluidInference/parakeet-tdt-0.6b-v3-coreml](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml)
- **Test Data**: [alexwengg/musan_mini*](https://huggingface.co/datasets/alexwengg) variants
2 changes: 1 addition & 1 deletion Documentation/Diarization/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ Notes:

### WeSpeaker/Pyannote Streaming

Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks.
Pyannote 3.1 pipeline for online/streaming use. Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 GettingStarted.md incorrectly labels streaming section as "Pyannote 3.1 pipeline"

The WeSpeaker/Pyannote Streaming section now says "Pyannote 3.1 pipeline for online/streaming use" but the online pipeline uses community-1, as established by PR #510 and confirmed by SegmentationProcessor.swift:227 and Documentation/Benchmarks.md:463.

Suggested change
Pyannote 3.1 pipeline for online/streaming use. Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks.
Pyannote community-1 pipeline for online/streaming use. Use `DiarizerManager` when you need the classic segmentation + embedding + speaker-database pipeline. This is the slowest streaming option and works best with larger chunks.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


Process audio in chunks for real-time applications:

Expand Down
2 changes: 1 addition & 1 deletion Documentation/Models.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ TDT models process audio in chunks (~15s with overlap) as batch operations.
|-------|-------------|---------|
| **LS-EEND** | Research prototype end-to-end streaming diarization model from Westlake University. Supports both streaming and complete-buffer inference for up to 10 speakers. Uses frame-in, frame-out processing, requiring 900ms of warmup audio and 100ms per update. | Added after Sortformer to support largers speaker counts. |
| **Sortformer** | NVIDIA's enterprise-grade end-to-end streaming diarization model. Supports both streaming and complete-buffer inference for up to 4 speakers. More stable than LS-EEND, but sometimes misses speech. Processes audio in chunks, requiring 1040ms of warmup audio and 480ms per update for the low latency versions. | Added after Pyannote to support low-latency streaming diarization. |
| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Best offline diarization pipeline, but also support online use | First diarizer model added. Converted from Pyannote with custom made batching mode |
| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-3.1. Offline batch pipeline (OfflineDiarizerManager) based on pyannote/speaker-diarization-community-1. | First diarizer model added. Converted from Pyannote with custom made batching mode |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Models.md incorrectly claims online pipeline is based on pyannote 3.1

The Pyannote CoreML Pipeline description now says "Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-3.1" but the actual model is based on community-1 as established by PR #510 (421313a), code comments in SegmentationProcessor.swift:227, and Documentation/Benchmarks.md:463.

Suggested change
| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-3.1. Offline batch pipeline (OfflineDiarizerManager) based on pyannote/speaker-diarization-community-1. | First diarizer model added. Converted from Pyannote with custom made batching mode |
| **Pyannote CoreML Pipeline** | Speaker diarization. Segmentation model + WeSpeaker embeddings for clustering. Online/streaming pipeline (DiarizerManager) based on pyannote/speaker-diarization-community-1. Offline batch pipeline (OfflineDiarizerManager) based on pyannote/speaker-diarization-community-1. | First diarizer model added. Converted from Pyannote with custom made batching mode |
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


## TTS Models

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,7 +373,7 @@ Both LS-EEND and Sortformer emit results into a `DiarizerTimeline` with ultra-lo

### Streaming/Online Speaker Diarization (Pyannote)

This pipeline uses segmentation plus speaker embeddings and is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization.
Pyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization. This is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 README.md incorrectly labels online diarization as "Pyannote 3.1 pipeline"

The README now says "Pyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization" but both pipelines use community-1 as established by PR #510 and confirmed by SegmentationProcessor.swift:227 and Documentation/Benchmarks.md:463.

Suggested change
Pyannote 3.1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization. This is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization.
Pyannote community-1 pipeline (segmentation + WeSpeaker embeddings) for online/streaming diarization. This is the third choice behind LS-EEND and Sortformer. It can be useful if you specifically want the classic multi-stage pipeline, but it is much slower than LS-EEND or Sortformer for live diarization.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


Why use the WeSpeaker/Pyannote pipeline:
- More modular pipeline if you want separate segmentation and embedding stages
Expand Down
Loading