feat(transcribe): add speaker diarization support #188

basnijholt · 2026-01-10T06:12:56Z

Summary

Add speaker diarization as a post-processing step for transcription using pyannote-audio
Identifies and labels different speakers in the transcript (useful for meetings, interviews, multi-speaker audio)
Works with any ASR provider (Wyoming, OpenAI, Gemini)
New optional dependency: pip install agent-cli[diarization]

New CLI Options

Option	Description
`--diarize/--no-diarize`	Enable speaker diarization
`--diarize-format`	Output format: `inline` (default) or `json`
`--hf-token`	HuggingFace token for pyannote models (required)
`--min-speakers`	Minimum number of speakers hint
`--max-speakers`	Maximum number of speakers hint

Output Formats

Inline (default):

[SPEAKER_00]: Hello, how are you?
[SPEAKER_01]: I'm doing well, thanks!

JSON:

{
  "segments": [
    {"speaker": "SPEAKER_00", "start": 0.0, "end": 2.5, "text": "Hello, how are you?"}
  ]
}

Usage Examples

# Install diarization extra
pip install agent-cli[diarization]

# Basic diarization
agent-cli transcribe --diarize --hf-token YOUR_HF_TOKEN

# Diarize a meeting recording with known participants
agent-cli transcribe --from-file meeting.wav --diarize --min-speakers 2 --max-speakers 4 --hf-token YOUR_TOKEN

Test plan

Unit tests for DiarizedSegment dataclass
Unit tests for align_transcript_with_speakers function
Unit tests for format_diarized_output (inline and JSON)
Unit tests for SpeakerDiarizer class with mocked pyannote
Updated existing transcribe recovery tests with new parameters
All 513 tests passing
Pre-commit hooks passing

Add speaker diarization as a post-processing step for transcription using pyannote-audio. This identifies and labels different speakers in the transcript, useful for meetings, interviews, or multi-speaker audio. Features: - New `--diarize` flag to enable speaker diarization - `--diarize-format` option for inline (default) or JSON output - `--hf-token` for HuggingFace authentication (required for pyannote models) - `--min-speakers` and `--max-speakers` hints for improved accuracy - Works with any ASR provider (Wyoming, OpenAI, Gemini) - New optional dependency: `pip install agent-cli[diarization]` Output formats: - Inline: `[SPEAKER_00]: Hello, how are you?` - JSON: structured with speaker, timestamps, and text

basnijholt added 7 commits January 10, 2026 07:12

chore: let pyannote-audio manage torch dependency

be3ad09

fix: use 'token' instead of deprecated 'use_auth_token' for pyannote

07d722b

docs: add all required model licenses and token permission info

ecea290

fix: pre-load audio with torchaudio to avoid torchcodec/FFmpeg issues

441c6dc

fix: handle new DiarizeOutput API from pyannote-audio

0465090

fix: show all required model URLs on gated repo access error

17fd7bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transcribe): add speaker diarization support #188

feat(transcribe): add speaker diarization support #188

Uh oh!

basnijholt commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(transcribe): add speaker diarization support #188

Are you sure you want to change the base?

feat(transcribe): add speaker diarization support #188

Uh oh!

Conversation

basnijholt commented Jan 10, 2026

Summary

New CLI Options

Output Formats

Usage Examples

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants