[WIP] Update WhisperX integration for VAD compatibility #2

Copilot · 2025-11-19T15:09:02Z

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original prompt

Update the i4Ds/whisperx-replicate repository to be compatible with the current m-bain/whisperX codebase, which now includes the VAD (voice activity detection) model directly in the library.

Goals:

Upgrade WhisperX integration

Update all imports and usage of WhisperX in this repo to match the current public API of m-bain/whisperX.

Remove any references to the older, now-outdated API surfaces.

Remove legacy VAD download / fetch logic

The VAD model is now part of WhisperX itself and no longer needs to be fetched or managed separately.

Identify and remove any code that:

Downloads or fetches VAD models manually (e.g., via torch.hub, direct URLs, or custom cache directories).

References external or separate VAD model paths or environment variables that are no longer needed.

Replace this logic with the new, recommended usage of VAD/diarization/alignment from the latest WhisperX library (e.g. using load_model, load_align_model, DiarizationPipeline, etc., as appropriate for the current version).

Align overall pipeline with latest WhisperX

Ensure the end-to-end ASR pipeline (transcription, optional alignment, and diarization) in whisperx-replicate uses the current recommended patterns from WhisperX.

Keep the external behavior of whisperx-replicate (inputs, outputs, and Replicate-facing interface) as stable as possible.

If some output structure must change due to WhisperX updates (e.g. word-level timestamps or diarization fields), adjust the wrapper code to produce a backward-compatible or clearly documented JSON schema.

Dependencies and configuration

Update Python dependencies (e.g., requirements.txt, pyproject.toml, or equivalent) so that an appropriate, recent WhisperX version is installed.

Prefer pinning to a stable tagged release or a specific commit of m-bain/whisperX that is compatible with the new code.

Remove obsolete dependency entries related to legacy VAD or any other components that WhisperX now bundles directly.

Testing and validation

Add or update a small test or example script that runs a full transcription pipeline on a short sample audio file using the updated code.

Verify at runtime that:

The repository no longer attempts to manually fetch VAD models.

The transcription, alignment, and diarization (if enabled) all complete successfully using the new WhisperX API.

Update documentation (README or equivalent) to briefly note that the repo is now compatible with the newer WhisperX, and mention any breaking changes or important behavioral differences compared to the older version.

The result should be a clean, working upgrade of whisperx-replicate that uses the integrated VAD and other current features of WhisperX, without relying on deprecated APIs or external model downloads.

This pull request was created as a result of the following prompt from Copilot chat.

Update the i4Ds/whisperx-replicate repository to be compatible with the current m-bain/whisperX codebase, which now includes the VAD (voice activity detection) model directly in the library.

Goals:

Upgrade WhisperX integration

Update all imports and usage of WhisperX in this repo to match the current public API of m-bain/whisperX.

Remove any references to the older, now-outdated API surfaces.

Remove legacy VAD download / fetch logic

The VAD model is now part of WhisperX itself and no longer needs to be fetched or managed separately.

Identify and remove any code that:

Downloads or fetches VAD models manually (e.g., via torch.hub, direct URLs, or custom cache directories).

References external or separate VAD model paths or environment variables that are no longer needed.

Replace this logic with the new, recommended usage of VAD/diarization/alignment from the latest WhisperX library (e.g. using load_model, load_align_model, DiarizationPipeline, etc., as appropriate for the current version).

Align overall pipeline with latest WhisperX

Ensure the end-to-end ASR pipeline (transcription, optional alignment, and diarization) in whisperx-replicate uses the current recommended patterns from WhisperX.

Keep the external behavior of whisperx-replicate (inputs, outputs, and Replicate-facing interface) as stable as possible.

If some output structure must change due to WhisperX updates (e.g. word-level timestamps or diarization fields), adjust the wrapper code to produce a backward-compatible or clearly documented JSON schema.

Dependencies and configuration

Update Python dependencies (e.g., requirements.txt, pyproject.toml, or equivalent) so that an appropriate, recent WhisperX version is installed.

Prefer pinning to a stable tagged release or a specific commit of m-bain/whisperX that is compatible with the new code.

Remove obsolete dependency entries related to legacy VAD or any other components that WhisperX now bundles directly.

Testing and validation

Add or update a small test or example script that runs a full transcription pipeline on a short sample audio file using the updated code.

Verify at runtime that:

The repository no longer attempts to manually fetch VAD models.

The transcription, alignment, and diarization (if enabled) all complete successfully using the new WhisperX API.

Update documentation (README or equivalent) to briefly note that the repo is now compatible with the newer WhisperX, and mention any breaking changes or important behavioral differences compared to the older version.

The result should be a clean, working upgrade of whisperx-replicate that uses the integrated VAD and other current features of WhisperX, without relying on deprecated APIs or external model downloads.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Initial plan

9e5fd8e

Copilot AI assigned Copilot and Korisis Nov 19, 2025

Copilot stopped work on behalf of Korisis due to an error November 19, 2025 15:09
Actor has insufficient permissions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Update WhisperX integration for VAD compatibility #2

[WIP] Update WhisperX integration for VAD compatibility #2

Uh oh!

Copilot AI commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Update WhisperX integration for VAD compatibility #2

Are you sure you want to change the base?

[WIP] Update WhisperX integration for VAD compatibility #2

Uh oh!

Conversation

Copilot AI commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants