A standalone, browser-based transcription tool for linguists and researchers. Designed with the help of coding AIs as a modern, open-source replacement for legacy transcription software like Transcriber. This applet is meant as the first tool in a corpus transcription and annotation pipeline. As such, its purpose and features are voluntarily limited, the idea being that more complex annotations will be performed with other tooles, such as ELAN, PRAAT, or other transcription/annotation tools. This applet relies heavily on Wavesurfer for sound segments management and visualization.
This applet is meant as a companion tool for UMR STL's "DOC-STL" corpus project.
- Multi-speaker annotation with individual speaker tracks, more speakers can be added as necessary
- Waveform visualization with synchronized playback across all tracks
- Spectrogram display (formants) for acoustic analysis
- Multiple segmentation methods:
- Manual marking with
Skey - Mouse drag segment selection
- Automatic silence detection
- F0-based speaker clustering
- Manual marking with
- Inline transcription editor with keyboard-focused workflow
- Segment loop playback for detailed transcription
- Speaker filtering on master track (useful for overlapping speech): toggle speakers to edit specific zones, particularly when overlaps occur
- Undo/Redo support (Ctrl+Z / Ctrl+Y), limited to auto-segmentation in this version
- Highpass/Lowpass filters to isolate vocal frequencies
- Playback speed control (0.5× to 2×) without pitch alteration
- Volume and zoom controls
- Project save/load with timestamped JSON files
- Export formats:
- ELAN (.eaf)
- Praat TextGrid
- SRT subtitles
- JSON
- CSV
- Download or clone the repository
- Open
index.htmlin a modern browser (Chrome, Firefox, Safari), or use the applet from Github/Gitlab pages - Load an audio file and start transcribing
cd opentranscriber
python3 -m http.server 8000
# Open http://localhost:8000| Key | Action |
|---|---|
Space |
Play/Pause |
← / → |
Skip ±5 seconds |
Home |
Go to start |
End |
Go to end |
| Key | Action |
|---|---|
S |
Mark start (first press) |
S |
Mark end (second press) |
| Mouse drag | Create segment by selection |
| Double-click | Edit segment |
| Key | Action |
|---|---|
Enter |
Save transcription & next |
Page Up |
Assign to previous speaker |
Page Down |
Assign to next speaker |
Delete |
Delete selected segment |
L |
Toggle loop playback |
| Key | Action |
|---|---|
N |
Next segment |
P |
Previous segment |
Ctrl+Z |
Undo |
Ctrl+Y |
Redo |
Escape |
Deselect |
? |
Show help |
Fast and reliable segmentation based on pause detection.
- Best for: clean recordings with clear pauses
- Parameters: silence threshold, minimum pause/segment duration
Combines silence detection with fundamental frequency analysis for automatic speaker attribution.
- Best for: 2-3 speakers with distinct voice pitches
- Parameters: F0 range, number of speakers
Voice Activity Detection with spectral feature clustering.
- Best for: noisy recordings, complex conversations
- Uses: energy, ZCR, spectral centroid
Caveat: does not work properly in the current version
Standard XML format for ELAN software.
- Time-aligned tiers per speaker
- Full Unicode support
Compatible with Praat for acoustic analysis.
- IntervalTier per speaker
- Precise timestamps
Standard subtitle format for video playback.
Machine-readable format for further processing.
{
"version": "7.0",
"savedAt": "2025-01-15T14:30:00Z",
"audio": {"filename": "interview.mp3", "duration": 1847.5},
"speakers": [
{"id": 1, "name": "Interviewer"},
{"id": 2, "name": "Participant"}
],
"segments": [
{"id": "seg_123", "start": 12.5, "end": 18.3, "speaker": 1, "transcription": "..."}
]
}Spreadsheet-compatible format for data analysis.
- Use speaker filter radio buttons to show one speaker at a time
- Adjust segment boundaries for that speaker
- Switch to next speaker and repeat
- Use "All" to review the complete annotation
- Apply highpass filter (~120 Hz) to hear high-pitched voices better
- Apply lowpass filter (~2500 Hz) for deeper voices
- Reset filters before final transcription
- Auto-segment the audio
- Click first segment, type transcription
- Press
Enterto save and move to next - Use
Page Up/Downto reassign speakers - Save regularly with timestamped files
- Browser: Chrome 80+, Firefox 75+, Safari 14+, Edge 80+
- Audio formats: MP3, WAV, OGG, FLAC, M4A (browser-dependent)
- Recommended file size: < 100 MB for optimal performance
opentranscriber/
├── index.html # Main interface
├── app.js # Application logic (~2100 lines)
├── styles.css # Styling
└── README.md # This file
- WaveSurfer.js 7.x - Waveform visualization
- Regions plugin: segment management
- Spectrogram plugin: formant display
- Timeline plugin: time axis
All dependencies are loaded from CDN (unpkg.com).
Contributions are welcome! Please feel free to submit issues and pull requests.
- Clone the repository
- Serve with any HTTP server
- Edit files and refresh browser
- ES6+ JavaScript
- JSDoc comments for public methods
- Consistent naming: camelCase for variables, PascalCase for classes
- Waveform-level zoom sync across speaker tracks
- Sound pre-processing to help better discriminate speakers
- Sound pre-processing to help better discriminate speech vs noise
- Merge adjacent segments
- Split segment at cursor
- Find & replace in transcriptions
- Auto-complete features
- Auto-save to localStorage
- STT integration (Whisper or other API)
- Multi-language interface
MIT License - See LICENSE file for details. This tool
- Inspired by Transcriber by DGA
- Built with WaveSurfer.js
- Developed for the linguistics research community
This tool is provided "as is". Be aware that this is not a full-fledged speech transcription/annotation platform, just a simple, lightweight and frictionless tool to help humans (linguists) transcribe audio recordings with multiple speakers.
Questions? Open an issue on GitHub or GitLab.