This project provides a simple batch transcription tool using OpenAI's Whisper model for speech recognition.
It automatically converts all audio files (such as .ogg, .mp3, .wav, and .m4a) in the current directory into text files with the same base name.
Each audio file is processed and a corresponding .txt file containing the transcription is saved in the same folder.
The main script batch_audio_to_text_whisper.py performs the following steps:
- Loads a Whisper model (you can choose
tiny,base,small,medium, orlarge). - Scans the current folder for compatible audio files.
- Transcribes each audio file into text.
- Saves each transcription as a
.txtfile with the same name.
For example:
meeting.ogg → meeting.txt
lecture.wav → lecture.txt
All outputs are saved in the same folder.
You can run the project locally without any external API keys or internet access once Whisper and Torch are installed.
python -m venv speech
speech\Scripts\activatepip install git+https://github.com/openai/whisper.git
pip install torch tqdmSupported formats: .ogg, .mp3, .wav, .m4a
python batch_audio_to_text_whisper.pyInside the script, you can modify:
MODEL_NAME = "medium" # tiny, base, small, medium, large
LANG = "pt" # language code, or None for auto-detectTo improve speed, you may select a smaller model like "small" or "base".
For each audio file, a text file with the same base name will be created, containing the transcription.
Example output structure:
audio1.ogg
audio1.txt
audio2.wav
audio2.txt
- All processing is done locally; no data is sent to the cloud.
- Whisper models are multilingual and automatically detect most spoken languages.
- Larger models are slower but more accurate.
- You can freely modify or extend the script to process subfolders or change output behavior.
This project uses OpenAI's open-source Whisper model under the MIT License.
Developed for research, experimentation, and educational use.