OpenTranscriber

A standalone, browser-based transcription tool for linguists and researchers. Designed with the help of coding AIs as a modern, open-source replacement for legacy transcription software like Transcriber. This applet is meant as the first tool in a corpus transcription and annotation pipeline. As such, its purpose and features are voluntarily limited, the idea being that more complex annotations will be performed with other tooles, such as ELAN, PRAAT, or other transcription/annotation tools. This applet relies heavily on Wavesurfer for sound segments management and visualization.

This applet is meant as a companion tool for UMR STL's "DOC-STL" corpus project.

Features

Core Functionality

Multi-speaker annotation with individual speaker tracks, more speakers can be added as necessary
Waveform visualization with synchronized playback across all tracks
Spectrogram display (formants) for acoustic analysis
Multiple segmentation methods:
- Manual marking with S key
- Mouse drag segment selection
- Automatic silence detection
- F0-based speaker clustering

Editing & Navigation

Inline transcription editor with keyboard-focused workflow
Segment loop playback for detailed transcription
Speaker filtering on master track (useful for overlapping speech): toggle speakers to edit specific zones, particularly when overlaps occur
Undo/Redo support (Ctrl+Z / Ctrl+Y), limited to auto-segmentation in this version

Audio Processing

Highpass/Lowpass filters to isolate vocal frequencies
Playback speed control (0.5× to 2×) without pitch alteration
Volume and zoom controls

Import/Export

Project save/load with timestamped JSON files
Export formats:
- ELAN (.eaf)
- Praat TextGrid
- SRT subtitles
- JSON
- CSV

Quick Start

Option 1: Direct Use (No Server)

Download or clone the repository
Open index.html in a modern browser (Chrome, Firefox, Safari), or use the applet from Github/Gitlab pages
Load an audio file and start transcribing

Option 2: Local Server (Recommended)

cd opentranscriber
python3 -m http.server 8000
# Open http://localhost:8000

Keyboard Shortcuts

Playback

Key	Action
`Space`	Play/Pause
`←` / `→`	Skip ±5 seconds
`Home`	Go to start
`End`	Go to end

Segmentation

Key	Action
`S`	Mark start (first press)
`S`	Mark end (second press)
Mouse drag	Create segment by selection
Double-click	Edit segment

Editing

Key	Action
`Enter`	Save transcription & next
`Page Up`	Assign to previous speaker
`Page Down`	Assign to next speaker
`Delete`	Delete selected segment
`L`	Toggle loop playback

Navigation & History

Key	Action
`N`	Next segment
`P`	Previous segment
`Ctrl+Z`	Undo
`Ctrl+Y`	Redo
`Escape`	Deselect
`?`	Show help

Auto-Segmentation Strategies

1. Silence Detection (Simple)

Fast and reliable segmentation based on pause detection.

Best for: clean recordings with clear pauses
Parameters: silence threshold, minimum pause/segment duration

2. Silence + F0 (Medium)

Combines silence detection with fundamental frequency analysis for automatic speaker attribution.

Best for: 2-3 speakers with distinct voice pitches
Parameters: F0 range, number of speakers

3. VAD + Clustering (Advanced)

Voice Activity Detection with spectral feature clustering.

Best for: noisy recordings, complex conversations
Uses: energy, ZCR, spectral centroid

Caveat: does not work properly in the current version

Export Formats

ELAN (.eaf)

Standard XML format for ELAN software.

Time-aligned tiers per speaker
Full Unicode support

Praat TextGrid

Compatible with Praat for acoustic analysis.

IntervalTier per speaker
Precise timestamps

SRT Subtitles

Standard subtitle format for video playback.

JSON

Machine-readable format for further processing.

{
  "version": "7.0",
  "savedAt": "2025-01-15T14:30:00Z",
  "audio": {"filename": "interview.mp3", "duration": 1847.5},
  "speakers": [
    {"id": 1, "name": "Interviewer"},
    {"id": 2, "name": "Participant"}
  ],
  "segments": [
    {"id": "seg_123", "start": 12.5, "end": 18.3, "speaker": 1, "transcription": "..."}
  ]
}

CSV

Spreadsheet-compatible format for data analysis.

Workflow Tips

Overlapping Speech

Use speaker filter radio buttons to show one speaker at a time
Adjust segment boundaries for that speaker
Switch to next speaker and repeat
Use "All" to review the complete annotation

Voice Discrimination

Apply highpass filter (~120 Hz) to hear high-pitched voices better
Apply lowpass filter (~2500 Hz) for deeper voices
Reset filters before final transcription

Efficient Transcription

Auto-segment the audio
Click first segment, type transcription
Press Enter to save and move to next
Use Page Up/Down to reassign speakers
Save regularly with timestamped files

Technical Requirements

Browser: Chrome 80+, Firefox 75+, Safari 14+, Edge 80+
Audio formats: MP3, WAV, OGG, FLAC, M4A (browser-dependent)
Recommended file size: < 100 MB for optimal performance

Architecture

opentranscriber/
├── index.html      # Main interface
├── app.js          # Application logic (~2100 lines)
├── styles.css      # Styling
└── README.md       # This file

Dependencies

WaveSurfer.js 7.x - Waveform visualization
- Regions plugin: segment management
- Spectrogram plugin: formant display
- Timeline plugin: time axis

All dependencies are loaded from CDN (unpkg.com).

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Development Setup

Clone the repository
Serve with any HTTP server
Edit files and refresh browser

Code Style

ES6+ JavaScript
JSDoc comments for public methods
Consistent naming: camelCase for variables, PascalCase for classes

Roadmap

License

MIT License - See LICENSE file for details. This tool

Acknowledgments

Inspired by Transcriber by DGA
Built with WaveSurfer.js
Developed for the linguistics research community

Caveats

This tool is provided "as is". Be aware that this is not a full-fledged speech transcription/annotation platform, just a simple, lightweight and frictionless tool to help humans (linguists) transcribe audio recordings with multiple speakers.

Questions? Open an issue on GitHub or GitLab.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
OpenTranscriber-tutorial.mp4		OpenTranscriber-tutorial.mp4
OpenTranscriber_thumbnail.png		OpenTranscriber_thumbnail.png
README.md		README.md
app.js		app.js
index.html		index.html
styles.css		styles.css

License

abalvet/OpenTranscriber

Folders and files

Latest commit

History

Repository files navigation

OpenTranscriber

Features

Core Functionality

Editing & Navigation

Audio Processing

Import/Export

Quick Start

Option 1: Direct Use (No Server)

Option 2: Local Server (Recommended)

Keyboard Shortcuts

Playback

Segmentation

Editing

Navigation & History

Auto-Segmentation Strategies

1. Silence Detection (Simple)

2. Silence + F0 (Medium)

3. VAD + Clustering (Advanced)

Export Formats

ELAN (.eaf)

Praat TextGrid

SRT Subtitles

JSON

CSV

Workflow Tips

Overlapping Speech

Voice Discrimination

Efficient Transcription

Technical Requirements

Architecture

Dependencies

Contributing

Development Setup

Code Style

Roadmap

License

Acknowledgments

Caveats

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages