Skip to content

Add Argmax Sortformer#96

Merged
EduardoPach merged 8 commits into
mainfrom
eduardo/argmax-sortformer
May 12, 2026
Merged

Add Argmax Sortformer#96
EduardoPach merged 8 commits into
mainfrom
eduardo/argmax-sortformer

Conversation

@EduardoPach
Copy link
Copy Markdown
Collaborator

@EduardoPach EduardoPach commented Feb 16, 2026

What does this PR do?

  • Integrate Sortformer as a new diarization model option alongside the existing pyannote3/pyannote4 clusterers across SpeakerKit, WhisperKitPro engine, and orchestration pipelines.
  • Add new pipeline aliases for standalone Sortformer diarization (speakerkit-sortformer-compressed) and orchestration combos with compressed Parakeet v2/v3 models (whisperkitpro-orchestration-parakeet-v{2,3}-compressed-sortformer-compressed).
  • Introduce diarization_mode config (realtime / prerecorded) to control Sortformer streaming latency, and a new subsegment orchestration strategy now used as the default for all WhisperKitPro orchestration pipelines.

Details

SpeakerKit diarization pipeline (speakerkit.py)

  • Accept sortformer_model_name and sortformer_model_variant config fields with cross-validation.
  • Move CLI arg construction and stdout parsing into SpeakerKitPipelineConfig so parsing logic adapts to the model type (Sortformer reports seconds; pyannote reports milliseconds).
  • Skip num_speakers for Sortformer (unsupported) with a warning.

WhisperKitPro engine & orchestration (whisperkitpro_engine.py, orchestration_whisperkitpro.py)

  • Extend clusterer_version literal to include sortformer and surface the new diarization_mode flag through CLI args.
  • Fix config key from clusterer_version_string to clusterer_version.
  • Enable --verbose by default on the engine CLI.

Pipeline aliases (pipeline_aliases.py)

  • Register three new aliases for Sortformer-backed pipelines.
  • Update all existing WhisperKitPro orchestration aliases to use subsegment strategy and the corrected clusterer_version key.

Test plan

  • Run speakerkit-sortformer-compressed on a sample diarization dataset and verify RTTM output and reported prediction time.
  • Run whisperkitpro-orchestration-parakeet-v3-compressed-sortformer-compressed end-to-end and confirm diarization segments are produced.
  • Verify existing pyannote-based pipelines (speakerkit, whisperkitpro-orchestration-*) remain unaffected.
  • Confirm diarization_mode flag is correctly forwarded (--diarization-mode prerecorded / realtime).

@EduardoPach EduardoPach requested a review from a2they February 19, 2026 23:29
@EduardoPach EduardoPach marked this pull request as ready for review February 19, 2026 23:33
@EduardoPach EduardoPach requested a review from dbrkn February 20, 2026 23:09
Copy link
Copy Markdown
Contributor

@arda-argmax arda-argmax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@EduardoPach EduardoPach merged commit 9ebe0e9 into main May 12, 2026
2 checks passed
@EduardoPach EduardoPach deleted the eduardo/argmax-sortformer branch May 12, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants