The terminal equivalent of agent-browser
Terminal automation CLI for AI agents
Control vim, htop, lazygit, dialog, and any TUI programmatically
Installation • Quick Start • Commands • AI Agents
Note
Built with AI, for AI. This project was built with the support of an AI agent, planned thoroughly with a tight feedback loop and reviewed at each step. While we've tested extensively, edge cases may exist. Use in production at your own discretion, and please report any issues you find!
pilotty enables AI agents to interact with terminal applications through a simple command-line interface. It manages pseudo-terminal (PTY) sessions with full VT100 terminal emulation, captures screen state, and provides keyboard/mouse input for navigating terminal user interfaces. Think of it as headless terminal automation for AI workflows.
- PTY (Pseudo-Terminal) Management: Spawn and manage terminal applications in background sessions
- Terminal Emulation: Full VT100 emulation for accurate screen capture and state tracking
- Keyboard Navigation: Interact with TUIs using Tab, Enter, arrow keys, and key combos
- AI-Friendly Output: Clean JSON responses with actionable suggestions on errors
- Multi-Session: Run multiple terminal apps simultaneously in isolated sessions
- Zero Config: Daemon auto-starts on first command, auto-stops after 5 minutes idle
agent-browser by Vercel Labs lets AI agents control web browsers. pilotty does the same for terminals.
Origin story: Built to solve a personal problem, pilotty was created to enable AI agents to interact with OpenTUI interfaces and control OpenCode programmatically. If you're building TUIs or working with terminal applications, pilotty lets AI navigate them just like a human would.
npm install -g pilottygit clone https://github.com/msmps/pilotty
cd pilotty
cargo build --release
./target/release/pilotty --helpRequires Rust 1.70+.
| Platform | Architecture | Status |
|---|---|---|
| macOS | x64 (Intel) | ✅ |
| macOS | arm64 (Apple Silicon) | ✅ |
| Linux | x64 | ✅ |
| Linux | arm64 | ✅ |
| Windows | - | ❌ Not supported |
Windows is not supported due to the use of Unix domain sockets and POSIX PTY APIs.
# Spawn a TUI application
pilotty spawn htop
# Take a snapshot of the terminal
pilotty snapshot
# Type text
pilotty type "hello world"
# Send keys
pilotty key Enter
pilotty key Ctrl+C
# Click at specific coordinates (row, col)
pilotty click 10 5
# List active sessions
pilotty list-sessions
# Stop the daemon
pilotty stoppilotty spawn <command> # Spawn a TUI app (e.g., pilotty spawn vim file.txt)
pilotty spawn --name myapp <cmd> # Spawn with a custom session name
pilotty spawn --cwd /path cmd # Spawn in a specific working directory
pilotty kill # Kill default session
pilotty kill -s myapp # Kill specific session
pilotty list-sessions # List all active sessions
pilotty stop # Stop the daemon and all sessions
pilotty daemon # Manually start daemon (usually auto-starts)
pilotty examples # Show end-to-end workflow examplepilotty snapshot # Full JSON with text
pilotty snapshot --format compact # JSON without text field
pilotty snapshot --format text # Plain text with cursor indicator
# Wait for screen to change before returning (no more manual sleep!)
pilotty snapshot --await-change $HASH # Block until hash differs
pilotty snapshot --await-change $HASH --settle 100 # Then wait for stabilitypilotty type "hello" # Type text at cursor
pilotty key Enter # Send Enter key
pilotty key Ctrl+C # Send Ctrl+C
pilotty key Alt+F # Send Alt+F
pilotty key F1 # Send function key
pilotty key Tab # Send Tab
pilotty key Escape # Send Escape
# Key sequences (space-separated keys sent in order)
pilotty key "Ctrl+X m" # Emacs chord: Ctrl+X then m
pilotty key "Escape : w q Enter" # vim :wq sequence
pilotty key "a b c" --delay 50 # Send a, b, c with 50ms delay betweenpilotty click 10 5 # Click at row 10, col 5
pilotty scroll up # Scroll up 1 line
pilotty scroll down 5 # Scroll down 5 linespilotty resize 120 40 # Resize terminal to 120x40
pilotty wait-for "Ready" # Wait for text to appear
pilotty wait-for "Error" --regex # Wait for regex pattern
pilotty wait-for "Done" -t 5000 # Wait with 5s timeout
# Wait for screen changes (preferred over sleep)
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH --settle 50 # Wait for change + 50ms stabilityThe snapshot command returns structured data about the terminal screen:
{
"snapshot_id": 42,
"size": { "cols": 80, "rows": 24 },
"cursor": { "row": 5, "col": 10, "visible": true },
"text": "Options: [x] Enable [ ] Debug\nActions: [OK] [Cancel]",
"elements": [
{ "kind": "toggle", "row": 0, "col": 9, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
{ "kind": "toggle", "row": 0, "col": 22, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
{ "kind": "button", "row": 1, "col": 9, "width": 4, "text": "[OK]", "confidence": 0.8 },
{ "kind": "button", "row": 1, "col": 14, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
],
"content_hash": 12345678901234567890
}pilotty automatically detects interactive UI elements in terminal applications. Elements provide read-only context to help understand UI structure, with position data (row, col) for use with the click command.
Use keyboard navigation (pilotty key Tab, pilotty key Enter, pilotty type "text") for reliable TUI interaction rather than element-based actions, as UI element detection depends on visual patterns that may disappear after interaction.
| Kind | Detection Patterns | Confidence |
|---|---|---|
| button | Inverse video, [OK], <Cancel> |
1.0 / 0.8 |
| input | Cursor position, ____ underscores |
1.0 / 0.6 |
| toggle | [x], [ ], ☑, ☐ |
1.0 |
| Field | Description |
|---|---|
kind |
Element type: button, input, or toggle |
row |
Row position (0-based) |
col |
Column position (0-based) |
width |
Width in terminal cells |
text |
Text content of the element |
confidence |
Detection confidence (0.0-1.0) |
focused |
Whether element has focus (only present if true) |
checked |
Toggle state (only present for toggles) |
The --await-change flag solves the fundamental problem of TUI automation: "How long should I wait after an action?"
Instead of guessing sleep durations (too short = race condition, too long = slow), wait for the screen to actually change:
# Capture baseline hash
HASH=$(pilotty snapshot | jq '.content_hash')
# Perform action
pilotty key Enter
# Wait for screen to change (blocks until hash differs)
pilotty snapshot --await-change $HASH
# Or wait for screen to stabilize (useful for apps that render progressively)
pilotty snapshot --await-change $HASH --settle 100 # Wait 100ms after last changeFlags:
--await-change <HASH>: Block untilcontent_hashdiffers from this value--settle <MS>: After change detected, wait for screen to be stable for this many ms-t, --timeout <MS>: Maximum wait time (default: 30000)
Why this matters:
- No more flaky automation due to race conditions
- No more slow scripts due to conservative sleep values
- Works regardless of how fast/slow the target app is
- The
--settleflag handles apps that render progressively
For AI-powered TUIs that stream responses (opencode, etc.), use longer settle times:
HASH=$(pilotty snapshot -s ai | jq -r '.content_hash')
pilotty type -s ai "explain this code"
pilotty key -s ai Enter
# Wait for streaming to complete: 3s settle, 60s timeout
pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000- Use
--settle 2000-3000because AI responses pause between chunks - Extend timeout with
-t 60000for longer generations - Long responses may scroll; use
pilotty scroll upto see the full output
For manual polling, use content_hash directly:
# Get initial snapshot
SNAP1=$(pilotty snapshot)
HASH1=$(echo "$SNAP1" | jq -r '.content_hash')
# Perform some action
pilotty key Tab
# Check if screen changed
SNAP2=$(pilotty snapshot)
HASH2=$(echo "$SNAP2" | jq -r '.content_hash')
if [ "$HASH1" != "$HASH2" ]; then
echo "Screen content changed"
fi# 1. Spawn a TUI with dialog elements
pilotty spawn dialog --yesno "Continue?" 10 40
# 2. Wait for dialog to render
pilotty wait-for "Continue"
# 3. Get snapshot with elements (for context)
pilotty snapshot | jq '.elements'
# Shows detected buttons, helps understand UI structure
# 4. Navigate and interact with keyboard (reliable approach)
pilotty key Tab # Move to next element
pilotty key Enter # Activate selected elementEach session is an isolated terminal with its own:
- PTY (pseudo-terminal)
- Screen buffer
- Child process
# Run multiple apps (--name must come before the command)
pilotty spawn --name monitoring htop
pilotty spawn --name editor vim file.txt
# Run app in a specific directory (useful for project-specific configs)
pilotty spawn --cwd /path/to/project --name myapp bun src/index.tsx
# Target specific session
pilotty snapshot -s monitoring
pilotty key -s editor Ctrl+S
# List all sessions
pilotty list-sessionsIf no --session is specified, pilotty uses the default session.
Note: The first session spawned without --name is automatically named default.
To run multiple sessions, give each a unique name with --name:
pilotty spawn --name monitoring htop
pilotty spawn --name editor vimImportant: The
--nameflag must come before the command. Everything after the command is passed as arguments to that command.
pilotty uses a daemon architecture similar to agent-browser:
┌─────────────┐ Unix Socket ┌─────────────────┐
│ CLI │ ──────────────────▶ │ Daemon │
│ (pilotty) │ JSON-line │ (auto-started) │
└─────────────┘ └─────────────────┘
│
┌────────┴────────┐
▼ ▼
┌───────────┐ ┌───────────┐
│ Session │ │ Session │
│ (htop) │ │ (vim) │
└───────────┘ └───────────┘
- Auto-start: Daemon starts automatically on first command
- Auto-stop: Daemon shuts down after 5 minutes with no active sessions
- Session cleanup: Sessions are automatically removed when their process exits
- Background: Runs in background, survives terminal close
- Shared state: Multiple CLI invocations share sessions
- Clean shutdown:
pilotty stopgracefully terminates all sessions
The daemon is designed for zero-maintenance operation:
- First command (e.g.,
pilotty spawn vim) starts the daemon automatically - Session ends (e.g., vim exits after
:wq) and the session is cleaned up within 500ms - Idle timeout: After 5 minutes with no sessions, the daemon shuts down
- Next command starts the daemon again automatically
This means you never need to manually manage the daemon, it starts when needed and stops when idle.
The daemon socket is created at (in priority order):
$PILOTTY_SOCKET_DIR/{session}.sock(explicit override)$XDG_RUNTIME_DIR/pilotty/{session}.sock(Linux standard)~/.pilotty/{session}.sock(home directory fallback)/tmp/pilotty/{session}.sock(last resort)
All errors include AI-friendly suggestions:
{
"code": "SESSION_NOT_FOUND",
"message": "Session 'abc123' not found",
"suggestion": "Run 'pilotty list-sessions' to see available sessions"
}| Variable | Description |
|---|---|
PILOTTY_SESSION |
Default session name |
PILOTTY_SOCKET_DIR |
Override socket directory |
RUST_LOG |
Logging level (e.g., debug, info) |
Add the skill to your AI coding assistant for richer context:
npx skills add msmps/pilottyThis works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf.
The simplest approach - just tell your agent to use it:
Use pilotty to interact with vim. Run pilotty --help to see available commands.
The --help output is comprehensive and most agents can figure it out from there.
For more consistent results, add to your project or global instructions file:
## Terminal Automation
Use `pilotty` for TUI automation. Run `pilotty --help` for all commands.
Core workflow:
1. `pilotty spawn <command>` - Start a TUI application
2. `pilotty snapshot` - Get screen state with cursor position
3. `pilotty key Tab` / `pilotty type "text"` - Navigate and interact
4. Re-snapshot after screen changes# 1. Spawn the application
pilotty spawn vim myfile.txt
# 2. Wait for it to be ready
pilotty wait-for "myfile.txt"
# 3. Take a snapshot to understand the screen and capture hash
HASH=$(pilotty snapshot | jq '.content_hash')
# 4. Navigate using keyboard commands
pilotty key i # Enter insert mode
pilotty type "Hello, World!"
pilotty key Escape
# 5. Wait for screen to update, then save (no manual sleep needed!)
pilotty snapshot --await-change $HASH --settle 50
pilotty key "Escape : w q Enter" # vim :wq sequence
# 6. Verify vim exited
pilotty list-sessionsFor AI-powered terminal apps that stream responses:
# 1. Spawn the AI app
pilotty spawn --name ai opencode
# 2. Wait for prompt
pilotty wait-for -s ai "Ask anything" -t 15000
# 3. Capture baseline hash, type prompt, submit
HASH=$(pilotty snapshot -s ai | jq -r '.content_hash')
pilotty type -s ai "write a haiku about rust"
pilotty key -s ai Enter
# 4. Wait for streaming response (3s settle, 60s timeout)
pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000 --format text
# 5. Scroll up if response is long
pilotty scroll -s ai up 10
pilotty snapshot -s ai --format text
# 6. Clean up
pilotty kill -s aiSupported key formats:
| Format | Example | Notes |
|---|---|---|
| Named keys | Enter, Tab, Escape, Space, Backspace |
Case insensitive |
| Arrow keys | Up, Down, Left, Right |
Also: ArrowUp, etc. |
| Navigation | Home, End, PageUp, PageDown, Insert, Delete |
Also: PgUp, PgDn, Ins, Del |
| Function keys | F1 - F12 |
|
| Ctrl combos | Ctrl+C, Ctrl+X, Ctrl+Z |
Also: Control+C |
| Alt combos | Alt+F, Alt+X |
Also: Meta+F, Option+F |
| Shift combos | Shift+A |
Only uppercases letter keys |
| Combined | Ctrl+Alt+C |
|
| Special | Plus |
Literal + character |
| Aliases | Return = Enter, Esc = Escape |
|
| Sequences | "Ctrl+X m", "Escape : w q Enter" |
Space-separated keys |
Send multiple keys in order with optional delay between them:
# Emacs-style chords
pilotty key "Ctrl+X Ctrl+S" # Save in Emacs
pilotty key "Ctrl+X m" # Compose mail in Emacs
# vim command sequences
pilotty key "Escape : w q Enter" # Save and quit vim
pilotty key "g g d G" # Delete entire file in vim
# With inter-key delay (useful for slow TUIs)
pilotty key "Tab Tab Enter" --delay 100 # Navigate with 100ms between keysThe --delay flag specifies milliseconds between keys (max 10000ms, default 0).
Contributions welcome! Please:
- Run
cargo fmtbefore committing - Run
cargo clippy --all --all-featuresand fix warnings - Add tests for new functionality
- Update documentation as needed
MIT
- Inspired by agent-browser by Vercel Labs
- Built with vt100 for terminal emulation
- Built with portable-pty for PTY management
