Talkie is a speech recognition application that transcribes audio input and simulates keyboard events to inject text into the active window. It runs continuously in the background with a Tk-based control interface.
The application monitors microphone input, performs voice activity detection, transcribes speech using configurable recognition engines, and types the results via the Linux uinput subsystem.
- Real-time audio transcription
- Multiple speech recognition engines (Vosk, Sherpa-ONNX, Faster-Whisper)
- Voice activity detection with configurable thresholds
- Keyboard event simulation via uinput
- Text preprocessing (punctuation commands, number conversion)
- External control via file-based IPC
- Persistent JSON configuration
- Single-instance enforcement
- Feedback logging for STT correction learning (with Claude Code hook support)
src/
├── talkie.tcl # Main application entry point
├── config.tcl # Configuration management
├── engine.tcl # Speech engine with integrated audio (worker thread)
├── audio.tcl # Result parsing and transcription state
├── worker.tcl # Reusable worker thread abstraction
├── output.tcl # Keyboard output (worker thread)
├── threshold.tcl # Confidence threshold management
├── textproc.tcl # Text preprocessing and voice commands
├── coprocess.tcl # External engine communication
├── ui-layout.tcl # Tk interface
├── feedback.tcl # Unified feedback logging for correction learning
├── display.tcl # Text display and visualization
├── vosk.tcl # Vosk engine bindings
├── gec/ # Grammar/Error Correction pipeline
│ ├── gec.tcl # GEC coordinator
│ ├── pipeline.tcl # ONNX inference pipeline
│ ├── punctcap.tcl # Punctuation and capitalization
│ ├── homophone.tcl # Homophone correction
│ └── tokens.tcl # BERT token constants
├── pa/ # PortAudio critcl bindings
├── audio/ # Audio processing critcl bindings
├── vosk/ # Vosk critcl bindings
├── uinput/ # uinput critcl bindings
└── engines/ # External engine wrappers
Audio processing runs on a dedicated worker thread, eliminating the main thread from the critical audio path:
┌─────────────────────────────────────────────────────────────────┐
│ Main Thread │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────────────────┐ │
│ │ GUI │ │ GEC │ │ Display │ │ Result Processing │ │
│ │ (5Hz) │ │Pipeline │ │ │ │ (parse_and_display) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲ ▲
│ thread::send -async │
│ (UI updates) │ thread::send -async
│ │ (recognition results)
┌───────┴─────────────────────────┐ ┌─────────────┴───────────────┐
│ Engine Worker Thread │ │ Output Worker Thread │
│ ┌──────────────────────────┐ │ │ ┌───────────────────────┐ │
│ │ PortAudio Callbacks │ │ │ │ uinput Keyboard │ │
│ │ (25ms chunks, 40Hz) │ │ │ │ Simulation │ │
│ └────────────┬─────────────┘ │ │ └───────────────────────┘ │
│ ▼ │ │ │
│ ┌──────────────────────────┐ │ └─────────────────────────────┘
│ │ Threshold Detection │ │
│ │ (adaptive noise floor) │ │
│ └────────────┬─────────────┘ │
│ ▼ │
│ ┌──────────────────────────┐ │
│ │ Vosk Recognition │ │
│ │ (or coprocess engine) │ │
│ └──────────────────────────┘ │
└─────────────────────────────────┘
Data Flow:
- PortAudio delivers 25ms audio chunks directly to engine worker thread
- Worker performs threshold detection and Vosk processing
- Recognition results sent to main thread for GEC processing
- Processed text sent to output worker for keyboard simulation
- UI updates throttled to 5Hz to reduce overhead
talkie.tcl: Application initialization, single-instance enforcement, module loading
config.tcl: JSON configuration file management (/.talkie.conf), file watching for external state changes (/.talkie)
engine.tcl: Integrates PortAudio stream on worker thread. Audio callbacks fire directly on worker, bypassing main thread. Handles threshold detection and speech recognition.
audio.tcl: Parses recognition results (JSON from Vosk), coordinates GEC processing, manages transcription state, device enumeration.
worker.tcl: Reusable worker thread abstraction using Tcl Thread package. Provides create, send, send_async, exists, destroy operations.
output.tcl: Keyboard simulation via uinput on dedicated worker thread. Async text output to avoid blocking main thread.
gec/: Grammar Error Correction pipeline using ONNX Runtime for BERT model inference. Adds punctuation, capitalization, and corrects homophones.
feedback.tcl: Unified feedback logging to ~/.config/talkie/feedback.jsonl. Captures GEC corrections, text injections, and user submissions for correction learning.
textproc.tcl: Punctuation command processing, text normalization
ui-layout.tcl: Tk GUI with transcription controls, real-time displays (5Hz updates), parameter adjustment
- Linux kernel with uinput support
- Tcl/Tk 8.6 or later
- PortAudio or PulseAudio
- User must be member of
inputgroup for uinput access
- Tk - GUI framework
- Thread - Worker thread management
- json - JSON parsing/generation
- jbr::unix - Unix utilities
- jbr::filewatch - File monitoring
- pa - PortAudio bindings (critcl)
- audio - Audio energy calculation (critcl)
- uinput - Keyboard simulation (critcl)
- vosk - Vosk speech engine (critcl)
Download and place in models/ directory:
- Vosk: vosk-model-en-us-0.22-lgraph
- Sherpa-ONNX: compatible ONNX models
- Faster-Whisper: CTranslate2 models
cd src
make buildThis compiles the PortAudio, audio processing, uinput, and Vosk critcl packages.
# Load uinput kernel module
sudo modprobe uinput
# Add permanent loading (optional)
echo "uinput" | sudo tee /etc/modules-load.d/uinput.conf
# Add user to input group
sudo usermod -a -G input $USER
# Logout and login for group membership to take effectDownload the appropriate model files for your chosen engine and place them in the models/ directory.
For Vosk:
mkdir -p models/vosk
cd models/vosk
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip
unzip vosk-model-en-us-0.22-lgraph.zipcd src
./talkie.shThe GUI window will appear. Only one instance can run at a time; additional launches will raise the existing window.
./talkie.sh start # Enable transcription
./talkie.sh stop # Disable transcription
./talkie.sh toggle # Toggle transcription state
./talkie.sh state # Display current stateTranscription state can be controlled by modifying ~/.talkie:
echo '{"transcribing": true}' > ~/.talkie # Start transcription
echo '{"transcribing": false}' > ~/.talkie # Stop transcriptionThe application monitors this file and updates state within 500ms.
During transcription, speak these commands to insert punctuation:
- "period" → .
- "comma" → ,
- "question mark" → ?
- "exclamation mark" → !
- "colon" → :
- "semicolon" → ;
- "new line" → \n
- "new paragraph" → \n\n
Spoken numbers are converted to digits: "twenty five" → "25"
Configuration file: ~/.talkie.conf (JSON format)
{
"sample_rate": 44100,
"frames_per_buffer": 4410,
"energy_threshold": 5.0,
"confidence_threshold": 200.0,
"device": "pulse",
"speech_engine": "vosk",
"silence_trailing_duration": 0.5,
"lookback_seconds": 1.0,
"vosk_max_alternatives": 0,
"vosk_beam": 20,
"vosk_lattice_beam": 8
}sample_rate: Audio sample rate in Hz (typically 44100 or 16000)
frames_per_buffer: Audio buffer size in frames
energy_threshold: Voice activity detection threshold (0-100)
confidence_threshold: Minimum recognition confidence for output (0-400)
device: PortAudio device identifier ("pulse" or device name)
speech_engine: Recognition engine ("vosk", "sherpa", or "faster-whisper")
silence_trailing_duration: Silence duration before finalizing utterance (seconds)
lookback_seconds: Pre-speech audio buffer duration (seconds)
vosk_max_alternatives: Number of recognition alternatives (0-5)
vosk_beam: Beam search width for Vosk (5-50)
vosk_lattice_beam: Lattice beam width for Vosk (1-20)
All parameters can be adjusted via the GUI or by editing the configuration file directly.
Talkie includes a unified feedback logging system for capturing the STT pipeline and learning from user corrections.
All events are logged to ~/.config/talkie/feedback.jsonl in JSON Lines format.
The feedback log captures three event types:
| Type | Description | Fields |
|---|---|---|
gec |
GEC correction applied | input, output |
inject |
Text sent to uinput | text |
submit |
User's final submission | text, session_id |
{"ts":1705500000000,"type":"gec","input":"their going","output":"they're going"}
{"ts":1705500000050,"type":"inject","text":"they're going"}
{"ts":1705500005000,"type":"submit","text":"they're going to the store","session_id":"abc123"}To capture user corrections in Claude Code, install the UserPromptSubmit hook:
# Install hook script
mkdir -p ~/.config/talkie/hooks
cp feedback/log-submission.sh ~/.config/talkie/hooks/
chmod +x ~/.config/talkie/hooks/log-submission.shAdd to ~/.claude/settings.json:
{
"hooks": {
"UserPromptSubmit": [
{
"hooks": [
{
"type": "command",
"command": "~/.config/talkie/hooks/log-submission.sh"
}
]
}
]
}
}Find user edits (where injected text differs from submitted):
jq -s '
[.[] | select(.type == "inject")] as $injects |
[.[] | select(.type == "submit")] as $submits |
[$injects[] as $i |
($submits[] | select(.ts > $i.ts and .ts < ($i.ts + 30000))) as $s |
select($i.text != $s.text) |
{injected: $i.text, submitted: $s.text, delay_ms: ($s.ts - $i.ts)}
]
' ~/.config/talkie/feedback.jsonl::feedback::configure -enabled 0- Sample Rate: 16kHz (device native rate)
- Chunk Size: 25ms (~400 frames at 16kHz)
- Callback Rate: 40Hz on engine worker thread
- Latency: ~50-100ms speech detection response
- Lookback: Configurable pre-speech audio buffering (default 1.0s)
- No Main Thread Blocking: Audio processing on dedicated worker
- Reduced Latency: Direct path from audio to recognition
- UI Responsiveness: GUI never waits for audio processing
- Throttled Updates: UI refreshes at 5Hz, not 40Hz
# Build all critcl packages
make build
# Build specific package
cd src/pa && make
cd src/audio && make
cd src/uinput && make
cd src/vosk && make- Add entry to
engine_registryinsrc/engine.tcl - For coprocess engines: create wrapper script in
src/engines/ - For critcl engines: create package directory with critcl code and Tcl interface
Run the application with console output visible:
cd src
./talkie.tcl 2>&1 | tee talkie.logERROR: Cannot write to /dev/uinput
Verify user is in input group and has logged out/in:
groups | grep inputVoid Linux: The /dev/uinput device needs group permissions set:
# Quick fix (temporary)
make fix-uinput
# Permanent fix: install runit service
make install-uinput-serviceERROR: /dev/uinput device not found
Load the uinput kernel module:
sudo modprobe uinputList available audio devices and update configuration:
pactl list sources short # For PulseAudio systemsVerify model path in configuration matches actual model location in models/ directory.
MIT
