Transcription review and correction platform with background processing, progress tracking, and session persistence
A powerful, user-friendly web application for reviewing and correcting ASR (Automatic Speech Recognition) transcriptions. Built for data annotation teams, researchers, and anyone working with speech-to-text accuracy improvement.
-
Review & Correct (Primary)
- Load pre-chunked audio with existing Excel transcripts
- Inline editing with real-time save
- Perfect for manual correction workflows
-
Auto-Transcribe (Secondary)
- Point to any folder of audio files
- Automatic chunking and AI transcription
- Background processing with progress tracking
- Background Job Processing: Handle thousands of files without blocking the UI
- Progress Tracking: Real-time progress bars and status updates
- Job Queue Management: One job per type at a time, prevents conflicts
- Session Persistence: LocalStorage tracking remembers what you've corrected
- Row Locking: Lock completed rows to prevent accidental edits
- Responsive Design: Works perfectly on desktop, tablet, and mobile
- Export Options: Download as CSV or Excel anytime
- Smart Audio Handling: FFmpeg-based streaming for large files
TL;DR: See QUICK_START.md for a 5-minute setup guide.
- Python 3.10+
- FFmpeg (for audio processing)
- 4GB+ RAM recommended
# 1. Clone the repository
git clone https://github.com/inboxpraveen/Speech-Annotation-Tool.git
cd Speech-Annotation-Tool
# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Install FFmpeg (if not already installed)
# Ubuntu/Debian: sudo apt install ffmpeg
# macOS: brew install ffmpeg
# Windows: Download from https://ffmpeg.org/
# 5. Run the application
python app.pyOpen your browser to http://localhost:5000
Best for: Teams with pre-transcribed audio chunks and Excel files
- Click on the "Review & Correct" card (blue, left side)
- Browse and select your chunked audio folder
- Browse and select your Excel file (must contain
filenameandtranscriptioncolumns) - Click "Load for Review"
- Wait for import to complete (progress shown in banner)
- Review and edit transcripts in the table below
- Click "Save" after each correction
- Lock rows when finalized
- Export results anytime
Excel Format Example:
filename,transcription,correct_transcripts
chunk_001.wav,Hello world,Hello world
chunk_002.wav,This is a test,This is a test
Best for: Starting from scratch with raw audio files
- Click on the "Auto-Transcribe" card (gray, right side)
- Browse and select your audio folder
- Select a Whisper model (default: small, balanced accuracy/speed)
- Click "Start Transcription"
- Monitor progress in the top banner
- Audio is automatically:
- Converted to 16kHz mono WAV
- Chunked into 30-second segments
- Transcribed with AI
- Review and correct results in the table
- Export when done
Supported Audio Formats: MP3, WAV, WMA, MPEG, OPUS, FLAC, M4A
- Feature Cards: Two clearly separated workflows with visual distinction
- Job Status Banner: Auto-appears when jobs are running, shows real-time progress
- Statistics Panel: Track total, corrected, and locked records at a glance
- Export Controls: Quick access to CSV/XLSX export
- Audio Player: Play each segment directly in browser
- Original Transcript: Read-only reference column
- Corrected Transcript: Editable field with autosave
- Action Buttons:
- Save: Persist your corrections
- Lock: Prevent further edits
- Unlock: Re-enable editing
- Progress bar shows percentage completion
- Item counter shows "X / Y items processed"
- Auto-refreshes every 2 seconds during active jobs
- Dismissible when complete
# Optional: Set default model
export ASR_MODEL="openai/whisper-small"
# Required for production: Set secret key
export FLASK_SECRET_KEY="your-secure-random-key-here"Available Whisper models (speed vs. accuracy trade-off):
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
openai/whisper-tiny |
⚡⚡⚡ | ⭐⭐ | Quick tests, drafts |
openai/whisper-base |
⚡⚡ | ⭐⭐⭐ | Balanced, general use |
openai/whisper-small |
⚡ | ⭐⭐⭐⭐ | Default, production |
All data is stored in the data/ directory (auto-created):
data/
├── segments/ # Audio chunks organized by job_id
├── exports/ # Exported CSV/XLSX files
├── transcriptions.csv # Main database
└── jobs.json # Job status tracking
The tool automatically tracks which records you've corrected using browser localStorage:
- Corrections persist across sessions
- Survive browser refresh
- Track completion progress
- Visual indicators for corrected rows
Clear tracking: Open browser console and run:
localStorage.removeItem('asr_corrections_tracker');Jobs run in separate threads using Python's built-in threading:
- UI remains responsive during long operations
- Progress updates every 2 seconds
- Only one job per type can run (prevents conflicts)
- Jobs survive brief network interruptions
- No additional services required (no Redis, RabbitMQ, or Celery)
For distributed processing across multiple servers, consider migrating to Celery. See PROJECT_DOCUMENTATION.md for details.
Prevent accidental overwrites:
- Edit and save a record
- Click "Lock" when finalized
- Row turns yellow and becomes read-only
- Click "Unlock" if changes needed
Speech-Annotation-Tool/
├── app.py # Entry point
├── requirements.txt # Dependencies
├── README.md # This file
├── PROJECT_DOCUMENTATION.md # Detailed technical docs
├── asr_tool/ # Main package
│ ├── __init__.py # Flask app factory
│ ├── config.py # Configuration
│ ├── routes.py # API endpoints
│ └── services/ # Business logic
│ ├── audio.py # Audio processing
│ ├── model.py # Whisper models
│ ├── storage.py # CSV database
│ └── job_manager.py # Background jobs
├── static/ # Frontend assets
│ ├── css/style.css # Styles
│ ├── js/script.js # JavaScript
│ └── images/ # Images
├── templates/ # HTML templates
│ ├── base.html
│ └── index.html
└── data/ # Runtime data (auto-created)
# Verify FFmpeg is installed
ffmpeg -version
# If not found, install it:
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html
# Add to PATH environment variableIf application crashes while a job is running:
- Stop the application
- Edit
data/jobs.json - Change job status from
"running"to"failed" - Restart application
For large audio files:
- Use a smaller Whisper model (
tinyorbase) - Reduce segment duration (edit
config.py) - Increase system RAM
- Process fewer files at once
Ensure your Excel file has required columns:
filename(required): Name of audio chunktranscription(required): Text to displaycorrect_transcripts(optional): Pre-filled corrections
Example:
filename,transcription
chunk_001.wav,Hello world
chunk_002.wav,This is a test
python app.py
# Access at http://localhost:5000pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:appFROM python:3.10-slim
# Install FFmpeg
RUN apt-get update && apt-get install -y ffmpeg
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"]Build and run:
docker build -t asr-tool .
docker run -p 5000:5000 -v $(pwd)/data:/app/data asr-tool- QUICK_START.md: 5-minute setup guide
- PROJECT_DOCUMENTATION.md: Complete technical documentation
- Architecture details
- API reference
- Background job processing (Threading vs Celery)
- Development guide
- Deployment instructions
- Troubleshooting
- CONTRIBUTING.md: Contribution guidelines
- In-Code Documentation: All functions have detailed docstrings
We welcome contributions from the community!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 style guide
- Add tests for new features
- Update documentation
- Use conventional commit messages
- Keep changes focused and atomic
# Install dev dependencies
pip install pytest pytest-cov black flake8
# Run tests
pytest tests/ -v
# Format code
black asr_tool/ --line-length 100
# Lint
flake8 asr_tool/ --max-line-length 100- Multi-user authentication and sessions
- Real-time collaboration (multiple editors)
- Custom model fine-tuning interface
- Speaker diarization support
- Batch export with filtering
- API token authentication
- Docker Compose setup
- Kubernetes deployment templates
- Database migration to PostgreSQL
- Redis caching layer
- Celery integration for distributed job processing (currently uses Python threading)
- S3 integration for audio storage
Note: The current version uses Python's built-in threading for background jobs. Celery integration is planned for future releases to support distributed processing across multiple workers.
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face Transformers: For Whisper model integration
- FFmpeg: For robust audio processing
- Flask: For the lightweight web framework
- Bootstrap: For responsive UI components
Need help? Have questions?
- Issues: GitHub Issues
- Documentation: PROJECT_DOCUMENTATION.md
If you find this project useful, please consider giving it a star! ⭐
Made with ❤️ for the open-source community
Version: 2.0.0 | Last Updated: December 2025

