pilotty

_{The terminal equivalent of agent-browser}

Terminal automation CLI for AI agents
Control vim, htop, lazygit, dialog, and any TUI programmatically

Installation • Quick Start • Commands • AI Agents

Note

Built with AI, for AI. This project was built with the support of an AI agent, planned thoroughly with a tight feedback loop and reviewed at each step. While we've tested extensively, edge cases may exist. Use in production at your own discretion, and please report any issues you find!

pilotty enables AI agents to interact with terminal applications through a simple command-line interface. It manages pseudo-terminal (PTY) sessions with full VT100 terminal emulation, captures screen state, and provides keyboard/mouse input for navigating terminal user interfaces. Think of it as headless terminal automation for AI workflows.

Features

PTY (Pseudo-Terminal) Management: Spawn and manage terminal applications in background sessions
Terminal Emulation: Full VT100 emulation for accurate screen capture and state tracking
Keyboard Navigation: Interact with TUIs using Tab, Enter, arrow keys, and key combos
AI-Friendly Output: Clean JSON responses with actionable suggestions on errors
Multi-Session: Run multiple terminal apps simultaneously in isolated sessions
Zero Config: Daemon auto-starts on first command, auto-stops after 5 minutes idle

Why pilotty?

agent-browser by Vercel Labs lets AI agents control web browsers. pilotty does the same for terminals.

Origin story: Built to solve a personal problem, pilotty was created to enable AI agents to interact with OpenTUI interfaces and control OpenCode programmatically. If you're building TUIs or working with terminal applications, pilotty lets AI navigate them just like a human would.

Installation

npm (recommended)

npm install -g pilotty

From Source

git clone https://github.com/msmps/pilotty
cd pilotty
cargo build --release
./target/release/pilotty --help

Requires Rust 1.70+.

Platform Support

Platform	Architecture	Status
macOS	x64 (Intel)	✅
macOS	arm64 (Apple Silicon)	✅
Linux	x64	✅
Linux	arm64	✅
Windows	-	❌ Not supported

Windows is not supported due to the use of Unix domain sockets and POSIX PTY APIs.

Quick Start

# Spawn a TUI application
pilotty spawn htop

# Take a snapshot of the terminal
pilotty snapshot

# Type text
pilotty type "hello world"

# Send keys
pilotty key Enter
pilotty key Ctrl+C

# Click at specific coordinates (row, col)
pilotty click 10 5

# List active sessions
pilotty list-sessions

# Stop the daemon
pilotty stop

Commands

Session Management

pilotty spawn <command>           # Spawn a TUI app (e.g., pilotty spawn vim file.txt)
pilotty spawn --name myapp <cmd>  # Spawn with a custom session name
pilotty spawn --cwd /path cmd     # Spawn in a specific working directory
pilotty kill                      # Kill default session
pilotty kill -s myapp             # Kill specific session
pilotty list-sessions             # List all active sessions
pilotty stop                      # Stop the daemon and all sessions
pilotty daemon                    # Manually start daemon (usually auto-starts)
pilotty examples                  # Show end-to-end workflow example

Screen Capture

pilotty snapshot                  # Full JSON with text
pilotty snapshot --format compact # JSON without text field
pilotty snapshot --format text    # Plain text with cursor indicator

# Wait for screen to change before returning (no more manual sleep!)
pilotty snapshot --await-change $HASH           # Block until hash differs
pilotty snapshot --await-change $HASH --settle 100  # Then wait for stability

Input

pilotty type "hello"              # Type text at cursor
pilotty key Enter                 # Send Enter key
pilotty key Ctrl+C                # Send Ctrl+C
pilotty key Alt+F                 # Send Alt+F
pilotty key F1                    # Send function key
pilotty key Tab                   # Send Tab
pilotty key Escape                # Send Escape

# Key sequences (space-separated keys sent in order)
pilotty key "Ctrl+X m"            # Emacs chord: Ctrl+X then m
pilotty key "Escape : w q Enter"  # vim :wq sequence
pilotty key "a b c" --delay 50    # Send a, b, c with 50ms delay between

Interaction

pilotty click 10 5                # Click at row 10, col 5
pilotty scroll up                 # Scroll up 1 line
pilotty scroll down 5             # Scroll down 5 lines

Terminal Control

pilotty resize 120 40             # Resize terminal to 120x40
pilotty wait-for "Ready"          # Wait for text to appear
pilotty wait-for "Error" --regex  # Wait for regex pattern
pilotty wait-for "Done" -t 5000   # Wait with 5s timeout

# Wait for screen changes (preferred over sleep)
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH --settle 50  # Wait for change + 50ms stability

Snapshot Output

The snapshot command returns structured data about the terminal screen:

{
  "snapshot_id": 42,
  "size": { "cols": 80, "rows": 24 },
  "cursor": { "row": 5, "col": 10, "visible": true },
  "text": "Options: [x] Enable  [ ] Debug\nActions: [OK] [Cancel]",
  "elements": [
    { "kind": "toggle", "row": 0, "col": 9, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
    { "kind": "toggle", "row": 0, "col": 22, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
    { "kind": "button", "row": 1, "col": 9, "width": 4, "text": "[OK]", "confidence": 0.8 },
    { "kind": "button", "row": 1, "col": 14, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
  ],
  "content_hash": 12345678901234567890
}

UI Elements (Contextual)

pilotty automatically detects interactive UI elements in terminal applications. Elements provide read-only context to help understand UI structure, with position data (row, col) for use with the click command.

Use keyboard navigation (pilotty key Tab, pilotty key Enter, pilotty type "text") for reliable TUI interaction rather than element-based actions, as UI element detection depends on visual patterns that may disappear after interaction.

Element Kinds

Kind	Detection Patterns	Confidence
button	Inverse video, `[OK]`, `<Cancel>`	1.0 / 0.8
input	Cursor position, `____` underscores	1.0 / 0.6
toggle	`[x]`, `[ ]`, `☑`, `☐`	1.0

Element Fields

Field	Description
`kind`	Element type: `button`, `input`, or `toggle`
`row`	Row position (0-based)
`col`	Column position (0-based)
`width`	Width in terminal cells
`text`	Text content of the element
`confidence`	Detection confidence (0.0-1.0)
`focused`	Whether element has focus (only present if true)
`checked`	Toggle state (only present for toggles)

Wait for Screen Changes

The --await-change flag solves the fundamental problem of TUI automation: "How long should I wait after an action?"

Instead of guessing sleep durations (too short = race condition, too long = slow), wait for the screen to actually change:

# Capture baseline hash
HASH=$(pilotty snapshot | jq '.content_hash')

# Perform action
pilotty key Enter

# Wait for screen to change (blocks until hash differs)
pilotty snapshot --await-change $HASH

# Or wait for screen to stabilize (useful for apps that render progressively)
pilotty snapshot --await-change $HASH --settle 100  # Wait 100ms after last change

Flags:

--await-change <HASH>: Block until content_hash differs from this value
--settle <MS>: After change detected, wait for screen to be stable for this many ms
-t, --timeout <MS>: Maximum wait time (default: 30000)

Why this matters:

No more flaky automation due to race conditions
No more slow scripts due to conservative sleep values
Works regardless of how fast/slow the target app is
The --settle flag handles apps that render progressively

Streaming AI Responses

For AI-powered TUIs that stream responses (opencode, etc.), use longer settle times:

HASH=$(pilotty snapshot -s ai | jq -r '.content_hash')
pilotty type -s ai "explain this code"
pilotty key -s ai Enter

# Wait for streaming to complete: 3s settle, 60s timeout
pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000

Use --settle 2000-3000 because AI responses pause between chunks
Extend timeout with -t 60000 for longer generations
Long responses may scroll; use pilotty scroll up to see the full output

Manual Change Detection

For manual polling, use content_hash directly:

# Get initial snapshot
SNAP1=$(pilotty snapshot)
HASH1=$(echo "$SNAP1" | jq -r '.content_hash')

# Perform some action
pilotty key Tab

# Check if screen changed
SNAP2=$(pilotty snapshot)
HASH2=$(echo "$SNAP2" | jq -r '.content_hash')

if [ "$HASH1" != "$HASH2" ]; then
  echo "Screen content changed"
fi

Workflow Example

# 1. Spawn a TUI with dialog elements
pilotty spawn dialog --yesno "Continue?" 10 40

# 2. Wait for dialog to render
pilotty wait-for "Continue"

# 3. Get snapshot with elements (for context)
pilotty snapshot | jq '.elements'
# Shows detected buttons, helps understand UI structure

# 4. Navigate and interact with keyboard (reliable approach)
pilotty key Tab      # Move to next element
pilotty key Enter    # Activate selected element

Sessions

Each session is an isolated terminal with its own:

PTY (pseudo-terminal)
Screen buffer
Child process

# Run multiple apps (--name must come before the command)
pilotty spawn --name monitoring htop
pilotty spawn --name editor vim file.txt

# Run app in a specific directory (useful for project-specific configs)
pilotty spawn --cwd /path/to/project --name myapp bun src/index.tsx

# Target specific session
pilotty snapshot -s monitoring
pilotty key -s editor Ctrl+S

# List all sessions
pilotty list-sessions

If no --session is specified, pilotty uses the default session.

Note: The first session spawned without --name is automatically named default. To run multiple sessions, give each a unique name with --name:

pilotty spawn --name monitoring htop
pilotty spawn --name editor vim

Important: The --name flag must come before the command. Everything after the command is passed as arguments to that command.

Daemon Architecture

pilotty uses a daemon architecture similar to agent-browser:

┌─────────────┐     Unix Socket      ┌─────────────────┐
│   CLI       │ ──────────────────▶  │     Daemon      │
│  (pilotty)  │     JSON-line        │  (auto-started) │
└─────────────┘                      └─────────────────┘
                                              │
                                     ┌────────┴────────┐
                                     ▼                 ▼
                              ┌───────────┐     ┌───────────┐
                              │  Session  │     │  Session  │
                              │  (htop)   │     │  (vim)    │
                              └───────────┘     └───────────┘

Auto-start: Daemon starts automatically on first command
Auto-stop: Daemon shuts down after 5 minutes with no active sessions
Session cleanup: Sessions are automatically removed when their process exits
Background: Runs in background, survives terminal close
Shared state: Multiple CLI invocations share sessions
Clean shutdown: pilotty stop gracefully terminates all sessions

Lifecycle

The daemon is designed for zero-maintenance operation:

First command (e.g., pilotty spawn vim) starts the daemon automatically
Session ends (e.g., vim exits after :wq) and the session is cleaned up within 500ms
Idle timeout: After 5 minutes with no sessions, the daemon shuts down
Next command starts the daemon again automatically

This means you never need to manually manage the daemon, it starts when needed and stops when idle.

Socket Location

The daemon socket is created at (in priority order):

$PILOTTY_SOCKET_DIR/{session}.sock (explicit override)
$XDG_RUNTIME_DIR/pilotty/{session}.sock (Linux standard)
~/.pilotty/{session}.sock (home directory fallback)
/tmp/pilotty/{session}.sock (last resort)

Error Handling

All errors include AI-friendly suggestions:

{
  "code": "SESSION_NOT_FOUND",
  "message": "Session 'abc123' not found",
  "suggestion": "Run 'pilotty list-sessions' to see available sessions"
}

Environment Variables

Variable	Description
`PILOTTY_SESSION`	Default session name
`PILOTTY_SOCKET_DIR`	Override socket directory
`RUST_LOG`	Logging level (e.g., `debug`, `info`)

Usage with AI Agents

AI Coding Assistants

Add the skill to your AI coding assistant for richer context:

npx skills add msmps/pilotty

This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf.

Just Ask the Agent

The simplest approach - just tell your agent to use it:

Use pilotty to interact with vim. Run pilotty --help to see available commands.

The --help output is comprehensive and most agents can figure it out from there.

AGENTS.md / CLAUDE.md

For more consistent results, add to your project or global instructions file:

## Terminal Automation

Use `pilotty` for TUI automation. Run `pilotty --help` for all commands.

Core workflow:
1. `pilotty spawn <command>` - Start a TUI application
2. `pilotty snapshot` - Get screen state with cursor position
3. `pilotty key Tab` / `pilotty type "text"` - Navigate and interact
4. Re-snapshot after screen changes

Example Workflow

# 1. Spawn the application
pilotty spawn vim myfile.txt

# 2. Wait for it to be ready
pilotty wait-for "myfile.txt"

# 3. Take a snapshot to understand the screen and capture hash
HASH=$(pilotty snapshot | jq '.content_hash')

# 4. Navigate using keyboard commands
pilotty key i                    # Enter insert mode
pilotty type "Hello, World!"
pilotty key Escape

# 5. Wait for screen to update, then save (no manual sleep needed!)
pilotty snapshot --await-change $HASH --settle 50
pilotty key "Escape : w q Enter"  # vim :wq sequence

# 6. Verify vim exited
pilotty list-sessions

Example: AI TUI Interaction

For AI-powered terminal apps that stream responses:

# 1. Spawn the AI app
pilotty spawn --name ai opencode

# 2. Wait for prompt
pilotty wait-for -s ai "Ask anything" -t 15000

# 3. Capture baseline hash, type prompt, submit
HASH=$(pilotty snapshot -s ai | jq -r '.content_hash')
pilotty type -s ai "write a haiku about rust"
pilotty key -s ai Enter

# 4. Wait for streaming response (3s settle, 60s timeout)
pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000 --format text

# 5. Scroll up if response is long
pilotty scroll -s ai up 10
pilotty snapshot -s ai --format text

# 6. Clean up
pilotty kill -s ai

Key Combinations

Supported key formats:

Format	Example	Notes
Named keys	`Enter`, `Tab`, `Escape`, `Space`, `Backspace`	Case insensitive
Arrow keys	`Up`, `Down`, `Left`, `Right`	Also: `ArrowUp`, etc.
Navigation	`Home`, `End`, `PageUp`, `PageDown`, `Insert`, `Delete`	Also: `PgUp`, `PgDn`, `Ins`, `Del`
Function keys	`F1` - `F12`
Ctrl combos	`Ctrl+C`, `Ctrl+X`, `Ctrl+Z`	Also: `Control+C`
Alt combos	`Alt+F`, `Alt+X`	Also: `Meta+F`, `Option+F`
Shift combos	`Shift+A`	Only uppercases letter keys
Combined	`Ctrl+Alt+C`
Special	`Plus`	Literal `+` character
Aliases	`Return` = `Enter`, `Esc` = `Escape`
Sequences	`"Ctrl+X m"`, `"Escape : w q Enter"`	Space-separated keys

Key Sequences

Send multiple keys in order with optional delay between them:

# Emacs-style chords
pilotty key "Ctrl+X Ctrl+S"       # Save in Emacs
pilotty key "Ctrl+X m"            # Compose mail in Emacs

# vim command sequences
pilotty key "Escape : w q Enter"  # Save and quit vim
pilotty key "g g d G"             # Delete entire file in vim

# With inter-key delay (useful for slow TUIs)
pilotty key "Tab Tab Enter" --delay 100   # Navigate with 100ms between keys

The --delay flag specifies milliseconds between keys (max 10000ms, default 0).

Contributing

Contributions welcome! Please:

Run cargo fmt before committing
Run cargo clippy --all --all-features and fix warnings
Add tests for new functionality
Update documentation as needed

License

MIT

Acknowledgments

Inspired by agent-browser by Vercel Labs
Built with vt100 for terminal emulation
Built with portable-pty for PTY management

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
assets		assets
crates		crates
npm		npm
scripts		scripts
skills/pilotty		skills/pilotty
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Uh oh!

msmps/pilotty

Folders and files

Latest commit

History

Repository files navigation

pilotty

Features

Why pilotty?

Installation

npm (recommended)

From Source

Platform Support

Quick Start

Commands

Session Management

Screen Capture

Input

Interaction

Terminal Control

Snapshot Output

UI Elements (Contextual)

Element Kinds

Element Fields

Wait for Screen Changes

Streaming AI Responses

Manual Change Detection

Workflow Example

Sessions

Daemon Architecture

Lifecycle

Socket Location

Error Handling

Environment Variables

Usage with AI Agents

AI Coding Assistants

Just Ask the Agent

AGENTS.md / CLAUDE.md

Example Workflow

Example: AI TUI Interaction

Key Combinations

Key Sequences

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Sponsor this project

Uh oh!

Packages 0

Languages

Packages