Skip to content

Conversation

Copy link

Copilot AI commented Nov 19, 2025

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original prompt

Update the i4Ds/whisperx-replicate repository to be compatible with the current m-bain/whisperX codebase, which now includes the VAD (voice activity detection) model directly in the library.

Goals:

  1. Upgrade WhisperX integration

    • Update all imports and usage of WhisperX in this repo to match the current public API of m-bain/whisperX.
    • Remove any references to the older, now-outdated API surfaces.
  2. Remove legacy VAD download / fetch logic

    • The VAD model is now part of WhisperX itself and no longer needs to be fetched or managed separately.
    • Identify and remove any code that:
      • Downloads or fetches VAD models manually (e.g., via torch.hub, direct URLs, or custom cache directories).
      • References external or separate VAD model paths or environment variables that are no longer needed.
    • Replace this logic with the new, recommended usage of VAD/diarization/alignment from the latest WhisperX library (e.g. using load_model, load_align_model, DiarizationPipeline, etc., as appropriate for the current version).
  3. Align overall pipeline with latest WhisperX

    • Ensure the end-to-end ASR pipeline (transcription, optional alignment, and diarization) in whisperx-replicate uses the current recommended patterns from WhisperX.
    • Keep the external behavior of whisperx-replicate (inputs, outputs, and Replicate-facing interface) as stable as possible.
    • If some output structure must change due to WhisperX updates (e.g. word-level timestamps or diarization fields), adjust the wrapper code to produce a backward-compatible or clearly documented JSON schema.
  4. Dependencies and configuration

    • Update Python dependencies (e.g., requirements.txt, pyproject.toml, or equivalent) so that an appropriate, recent WhisperX version is installed.
    • Prefer pinning to a stable tagged release or a specific commit of m-bain/whisperX that is compatible with the new code.
    • Remove obsolete dependency entries related to legacy VAD or any other components that WhisperX now bundles directly.
  5. Testing and validation

    • Add or update a small test or example script that runs a full transcription pipeline on a short sample audio file using the updated code.
    • Verify at runtime that:
      • The repository no longer attempts to manually fetch VAD models.
      • The transcription, alignment, and diarization (if enabled) all complete successfully using the new WhisperX API.
    • Update documentation (README or equivalent) to briefly note that the repo is now compatible with the newer WhisperX, and mention any breaking changes or important behavioral differences compared to the older version.

The result should be a clean, working upgrade of whisperx-replicate that uses the integrated VAD and other current features of WhisperX, without relying on deprecated APIs or external model downloads.

This pull request was created as a result of the following prompt from Copilot chat.

Update the i4Ds/whisperx-replicate repository to be compatible with the current m-bain/whisperX codebase, which now includes the VAD (voice activity detection) model directly in the library.

Goals:

  1. Upgrade WhisperX integration

    • Update all imports and usage of WhisperX in this repo to match the current public API of m-bain/whisperX.
    • Remove any references to the older, now-outdated API surfaces.
  2. Remove legacy VAD download / fetch logic

    • The VAD model is now part of WhisperX itself and no longer needs to be fetched or managed separately.
    • Identify and remove any code that:
      • Downloads or fetches VAD models manually (e.g., via torch.hub, direct URLs, or custom cache directories).
      • References external or separate VAD model paths or environment variables that are no longer needed.
    • Replace this logic with the new, recommended usage of VAD/diarization/alignment from the latest WhisperX library (e.g. using load_model, load_align_model, DiarizationPipeline, etc., as appropriate for the current version).
  3. Align overall pipeline with latest WhisperX

    • Ensure the end-to-end ASR pipeline (transcription, optional alignment, and diarization) in whisperx-replicate uses the current recommended patterns from WhisperX.
    • Keep the external behavior of whisperx-replicate (inputs, outputs, and Replicate-facing interface) as stable as possible.
    • If some output structure must change due to WhisperX updates (e.g. word-level timestamps or diarization fields), adjust the wrapper code to produce a backward-compatible or clearly documented JSON schema.
  4. Dependencies and configuration

    • Update Python dependencies (e.g., requirements.txt, pyproject.toml, or equivalent) so that an appropriate, recent WhisperX version is installed.
    • Prefer pinning to a stable tagged release or a specific commit of m-bain/whisperX that is compatible with the new code.
    • Remove obsolete dependency entries related to legacy VAD or any other components that WhisperX now bundles directly.
  5. Testing and validation

    • Add or update a small test or example script that runs a full transcription pipeline on a short sample audio file using the updated code.
    • Verify at runtime that:
      • The repository no longer attempts to manually fetch VAD models.
      • The transcription, alignment, and diarization (if enabled) all complete successfully using the new WhisperX API.
    • Update documentation (README or equivalent) to briefly note that the repo is now compatible with the newer WhisperX, and mention any breaking changes or important behavioral differences compared to the older version.

The result should be a clean, working upgrade of whisperx-replicate that uses the integrated VAD and other current features of WhisperX, without relying on deprecated APIs or external model downloads.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot stopped work on behalf of Korisis due to an error November 19, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants