Talkie - Chat with your Linux desktop

Real-time speech-to-text transcription with keyboard simulation for Linux.

Description

Talkie is a speech recognition application that transcribes audio input and simulates keyboard events to inject text into the active window. It runs continuously in the background with a Tk-based control interface.

The application monitors microphone input, performs voice activity detection, transcribes speech using configurable recognition engines, and types the results via the Linux uinput subsystem.

Features

Real-time audio transcription
Multiple speech recognition engines (Vosk, Sherpa-ONNX, Faster-Whisper)
Voice activity detection with configurable thresholds
Keyboard event simulation via uinput
Text preprocessing (punctuation commands, number conversion)
External control via file-based IPC
Persistent JSON configuration
Single-instance enforcement
Feedback logging for STT correction learning (with Claude Code hook support)

Architecture

src/
├── talkie.tcl          # Main application entry point
├── config.tcl          # Configuration management
├── engine.tcl          # Speech engine with integrated audio (worker thread)
├── audio.tcl           # Result parsing and transcription state
├── worker.tcl          # Reusable worker thread abstraction
├── output.tcl          # Keyboard output (worker thread)
├── threshold.tcl       # Confidence threshold management
├── textproc.tcl        # Text preprocessing and voice commands
├── coprocess.tcl       # External engine communication
├── ui-layout.tcl       # Tk interface
├── feedback.tcl        # Unified feedback logging for correction learning
├── display.tcl         # Text display and visualization
├── vosk.tcl            # Vosk engine bindings
├── gec/                # Grammar/Error Correction pipeline
│   ├── gec.tcl         # GEC coordinator
│   ├── pipeline.tcl    # ONNX inference pipeline
│   ├── punctcap.tcl    # Punctuation and capitalization
│   ├── homophone.tcl   # Homophone correction
│   └── tokens.tcl      # BERT token constants
├── pa/                 # PortAudio critcl bindings
├── audio/              # Audio processing critcl bindings
├── vosk/               # Vosk critcl bindings
├── uinput/             # uinput critcl bindings
└── engines/            # External engine wrappers

Threading Architecture

Audio processing runs on a dedicated worker thread, eliminating the main thread from the critical audio path:

┌─────────────────────────────────────────────────────────────────┐
│                        Main Thread                               │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────────────────┐ │
│  │   GUI   │  │  GEC    │  │ Display │  │ Result Processing   │ │
│  │ (5Hz)   │  │Pipeline │  │         │  │ (parse_and_display) │ │
│  └─────────┘  └─────────┘  └─────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
        ▲                                          ▲
        │ thread::send -async                      │
        │ (UI updates)                             │ thread::send -async
        │                                          │ (recognition results)
┌───────┴─────────────────────────┐  ┌─────────────┴───────────────┐
│      Engine Worker Thread       │  │     Output Worker Thread    │
│  ┌──────────────────────────┐  │  │  ┌───────────────────────┐  │
│  │   PortAudio Callbacks    │  │  │  │   uinput Keyboard     │  │
│  │   (25ms chunks, 40Hz)    │  │  │  │   Simulation          │  │
│  └────────────┬─────────────┘  │  │  └───────────────────────┘  │
│               ▼                 │  │                             │
│  ┌──────────────────────────┐  │  └─────────────────────────────┘
│  │  Threshold Detection     │  │
│  │  (adaptive noise floor)  │  │
│  └────────────┬─────────────┘  │
│               ▼                 │
│  ┌──────────────────────────┐  │
│  │  Vosk Recognition        │  │
│  │  (or coprocess engine)   │  │
│  └──────────────────────────┘  │
└─────────────────────────────────┘

Data Flow:

PortAudio delivers 25ms audio chunks directly to engine worker thread
Worker performs threshold detection and Vosk processing
Recognition results sent to main thread for GEC processing
Processed text sent to output worker for keyboard simulation
UI updates throttled to 5Hz to reduce overhead

Component Overview

talkie.tcl: Application initialization, single-instance enforcement, module loading

config.tcl: JSON configuration file management (~~/.talkie.conf), file watching for external state changes (~~/.talkie)

engine.tcl: Integrates PortAudio stream on worker thread. Audio callbacks fire directly on worker, bypassing main thread. Handles threshold detection and speech recognition.

audio.tcl: Parses recognition results (JSON from Vosk), coordinates GEC processing, manages transcription state, device enumeration.

worker.tcl: Reusable worker thread abstraction using Tcl Thread package. Provides create, send, send_async, exists, destroy operations.

output.tcl: Keyboard simulation via uinput on dedicated worker thread. Async text output to avoid blocking main thread.

gec/: Grammar Error Correction pipeline using ONNX Runtime for BERT model inference. Adds punctuation, capitalization, and corrects homophones.

feedback.tcl: Unified feedback logging to ~/.config/talkie/feedback.jsonl. Captures GEC corrections, text injections, and user submissions for correction learning.

textproc.tcl: Punctuation command processing, text normalization

ui-layout.tcl: Tk GUI with transcription controls, real-time displays (5Hz updates), parameter adjustment

Dependencies

System Requirements

Linux kernel with uinput support
Tcl/Tk 8.6 or later
PortAudio or PulseAudio
User must be member of input group for uinput access

Tcl Packages

Tk - GUI framework
Thread - Worker thread management
json - JSON parsing/generation
jbr::unix - Unix utilities
jbr::filewatch - File monitoring
pa - PortAudio bindings (critcl)
audio - Audio energy calculation (critcl)
uinput - Keyboard simulation (critcl)
vosk - Vosk speech engine (critcl)

Speech Engine Models

Download and place in models/ directory:

Vosk: vosk-model-en-us-0.22-lgraph
Sherpa-ONNX: compatible ONNX models
Faster-Whisper: CTranslate2 models

Installation

1. Build critcl Bindings

cd src
make build

This compiles the PortAudio, audio processing, uinput, and Vosk critcl packages.

2. Configure uinput Access

# Load uinput kernel module
sudo modprobe uinput

# Add permanent loading (optional)
echo "uinput" | sudo tee /etc/modules-load.d/uinput.conf

# Add user to input group
sudo usermod -a -G input $USER

# Logout and login for group membership to take effect

3. Download Speech Models

Download the appropriate model files for your chosen engine and place them in the models/ directory.

For Vosk:

mkdir -p models/vosk
cd models/vosk
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip
unzip vosk-model-en-us-0.22-lgraph.zip

Usage

Starting the Application

cd src
./talkie.sh

The GUI window will appear. Only one instance can run at a time; additional launches will raise the existing window.

Command-Line Interface

./talkie.sh start       # Enable transcription
./talkie.sh stop        # Disable transcription
./talkie.sh toggle      # Toggle transcription state
./talkie.sh state       # Display current state

External Control

Transcription state can be controlled by modifying ~/.talkie:

echo '{"transcribing": true}' > ~/.talkie   # Start transcription
echo '{"transcribing": false}' > ~/.talkie  # Stop transcription

The application monitors this file and updates state within 500ms.

Voice Commands

During transcription, speak these commands to insert punctuation:

"period" → .
"comma" → ,
"question mark" → ?
"exclamation mark" → !
"colon" → :
"semicolon" → ;
"new line" → \n
"new paragraph" → \n\n

Spoken numbers are converted to digits: "twenty five" → "25"

Configuration

Configuration file: ~/.talkie.conf (JSON format)

Default Settings

{
    "sample_rate": 44100,
    "frames_per_buffer": 4410,
    "energy_threshold": 5.0,
    "confidence_threshold": 200.0,
    "device": "pulse",
    "speech_engine": "vosk",
    "silence_trailing_duration": 0.5,
    "lookback_seconds": 1.0,
    "vosk_max_alternatives": 0,
    "vosk_beam": 20,
    "vosk_lattice_beam": 8
}

Parameters

sample_rate: Audio sample rate in Hz (typically 44100 or 16000)

frames_per_buffer: Audio buffer size in frames

energy_threshold: Voice activity detection threshold (0-100)

confidence_threshold: Minimum recognition confidence for output (0-400)

device: PortAudio device identifier ("pulse" or device name)

speech_engine: Recognition engine ("vosk", "sherpa", or "faster-whisper")

silence_trailing_duration: Silence duration before finalizing utterance (seconds)

lookback_seconds: Pre-speech audio buffer duration (seconds)

vosk_max_alternatives: Number of recognition alternatives (0-5)

vosk_beam: Beam search width for Vosk (5-50)

vosk_lattice_beam: Lattice beam width for Vosk (1-20)

All parameters can be adjusted via the GUI or by editing the configuration file directly.

Feedback Logging

Talkie includes a unified feedback logging system for capturing the STT pipeline and learning from user corrections.

Log Location

All events are logged to ~/.config/talkie/feedback.jsonl in JSON Lines format.

Event Types

The feedback log captures three event types:

Type	Description	Fields
`gec`	GEC correction applied	`input`, `output`
`inject`	Text sent to uinput	`text`
`submit`	User's final submission	`text`, `session_id`

Example Log Entries

{"ts":1705500000000,"type":"gec","input":"their going","output":"they're going"}
{"ts":1705500000050,"type":"inject","text":"they're going"}
{"ts":1705500005000,"type":"submit","text":"they're going to the store","session_id":"abc123"}

Claude Code Integration

To capture user corrections in Claude Code, install the UserPromptSubmit hook:

# Install hook script
mkdir -p ~/.config/talkie/hooks
cp feedback/log-submission.sh ~/.config/talkie/hooks/
chmod +x ~/.config/talkie/hooks/log-submission.sh

Add to ~/.claude/settings.json:

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "~/.config/talkie/hooks/log-submission.sh"
          }
        ]
      }
    ]
  }
}

Analyzing Corrections

Find user edits (where injected text differs from submitted):

jq -s '
  [.[] | select(.type == "inject")] as $injects |
  [.[] | select(.type == "submit")] as $submits |
  [$injects[] as $i |
    ($submits[] | select(.ts > $i.ts and .ts < ($i.ts + 30000))) as $s |
    select($i.text != $s.text) |
    {injected: $i.text, submitted: $s.text, delay_ms: ($s.ts - $i.ts)}
  ]
' ~/.config/talkie/feedback.jsonl

Disabling Feedback Logging

::feedback::configure -enabled 0

Performance

Audio Processing

Sample Rate: 16kHz (device native rate)
Chunk Size: 25ms (~400 frames at 16kHz)
Callback Rate: 40Hz on engine worker thread
Latency: ~50-100ms speech detection response
Lookback: Configurable pre-speech audio buffering (default 1.0s)

Threading Benefits

No Main Thread Blocking: Audio processing on dedicated worker
Reduced Latency: Direct path from audio to recognition
UI Responsiveness: GUI never waits for audio processing
Throttled Updates: UI refreshes at 5Hz, not 40Hz

Development

Building Components

# Build all critcl packages
make build

# Build specific package
cd src/pa && make
cd src/audio && make
cd src/uinput && make
cd src/vosk && make

Adding a Speech Engine

Add entry to engine_registry in src/engine.tcl
For coprocess engines: create wrapper script in src/engines/
For critcl engines: create package directory with critcl code and Tcl interface

Testing

Run the application with console output visible:

cd src
./talkie.tcl 2>&1 | tee talkie.log

Troubleshooting

uinput Permission Denied

ERROR: Cannot write to /dev/uinput

Verify user is in input group and has logged out/in:

groups | grep input

Void Linux: The /dev/uinput device needs group permissions set:

# Quick fix (temporary)
make fix-uinput

# Permanent fix: install runit service
make install-uinput-service

uinput Device Not Found

ERROR: /dev/uinput device not found

Load the uinput kernel module:

sudo modprobe uinput

Audio Device Errors

List available audio devices and update configuration:

pactl list sources short  # For PulseAudio systems

Speech Engine Model Not Found

Verify model path in configuration matches actual model location in models/ directory.

License

MIT

Author

john@rkroll.com

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
data		data
docs		docs
etc/sv/uinput-perms		etc/sv/uinput-perms
feedback		feedback
models/gec		models/gec
src		src
test_audio		test_audio
tests		tests
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
FASTER_WHISPER_INTEGRATION.md		FASTER_WHISPER_INTEGRATION.md
Makefile		Makefile
README.md		README.md
TCL9_COMPATIBILITY.md		TCL9_COMPATIBILITY.md
THREADING_FINAL.md		THREADING_FINAL.md
THREADING_PLAN.md		THREADING_PLAN.md
critcl9		critcl9
icon.dtk		icon.dtk
icon.png		icon.png
icon.svg		icon.svg
pactl.md		pactl.md
screenshot.png		screenshot.png
talkie.desktop		talkie.desktop
talkie.map		talkie.map
talkie.stop		talkie.stop

jbroll/talkie

Folders and files

Latest commit

History

Repository files navigation