Skip to content

Conversation

@basnijholt
Copy link
Owner

Summary

  • Add interactive terminal UI to the chat command with live transcription mode
  • Support pause/resume (Escape key) to mute mic for side conversations
  • Add slash commands: /tts, /mode, /tools, /clear, /help
  • Enable tool toggling at runtime via /tools disable|enable <name>
  • Support two input modes: "live" (default, VAD-based) and "direct" (Ctrl+C to end)

New Files

  • agent_cli/core/voice_input.py: Shared VAD recording loop extracted for reuse
  • agent_cli/core/chat_state.py: Session state management & slash command handling

Dependencies

  • Added prompt_toolkit>=3.0.0 for async editable input with key bindings

CLI Options

  • --vad-threshold (0.0-1.0): VAD speech detection threshold
  • --silence-threshold: Seconds of silence to end a speech segment

Test plan

  • All existing tests pass (476 passed, excluding VAD tests with SSL issues)
  • Pre-commit checks pass (ruff, mypy, formatting)
  • Manual testing: run agent-cli chat and verify live transcription works
  • Manual testing: verify Escape pauses/resumes recording
  • Manual testing: verify slash commands work (/help, /tts, /mode, /tools, /clear)
  • Manual testing: verify /mode direct switches to original Ctrl+C behavior

basnijholt and others added 7 commits January 4, 2026 07:26
Enhance the chat command with an interactive terminal UI that supports:
- Live transcription mode: text appears as you speak, editable before sending
- Pause/resume: Escape key to mute mic for side conversations
- Slash commands: /tts, /mode, /tools, /clear, /help
- Tool toggling: enable/disable specific tools at runtime
- Two input modes: "live" (default, VAD-based) and "direct" (Ctrl+C to end)

New files:
- agent_cli/core/voice_input.py: shared VAD recording loop
- agent_cli/core/chat_state.py: session state & slash command handling

Added prompt_toolkit dependency for async editable input with key bindings.
- Add explicit Ctrl+C key binding to properly exit live input mode
- Track accumulated text length to append new transcriptions instead of
  replacing entire buffer, preserving cursor position when editing
- Remove conflicting console.print status updates that caused flickering
  with prompt_toolkit's display management
- Return None from _get_live_input on Ctrl+C to signal exit, main loop
  now breaks on None instead of continuing with "No input received"
- Add bottom_toolbar to PromptSession showing live status (🎤 Listening,
  🔴 Recording, ⏳ Processing, ⏸️ Paused, ✓ Ready)
- Insert transcribed text at cursor position instead of always appending
  at end, allowing users to position cursor before speaking
- Replace emoji status icons with ASCII text to avoid rendering issues
- Use call_soon_threadsafe for all UI updates from background voice task
- Use mutable list holder for status to avoid race conditions
- Schedule buffer text updates on event loop for thread safety
…wrappers

- Remove STATUS_ICONS and status toolbar (was causing display issues)
- Remove unused _create_input_panel function
- Remove call_soon_threadsafe wrappers (callbacks run on same event loop)
- Simplify on_text_update to just set buffer text directly
- Remove unused imports (Panel, Text, VoiceInputStatus)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants