diff --git a/.gitignore b/.gitignore
index cc1d69b..ebcf97a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -18,6 +18,7 @@ target/
# Agent/AI tooling
.opencode/
.claude/
+.agents/
# Internal docs (not for public repo)
docs/
diff --git a/Cargo.lock b/Cargo.lock
index 971c859..e8eecf7 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -491,6 +491,7 @@ version = "0.0.4"
dependencies = [
"serde",
"serde_json",
+ "unicode-width",
]
[[package]]
diff --git a/Cargo.toml b/Cargo.toml
index 2e89f7c..7d6705d 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -34,6 +34,7 @@ anyhow = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
regex = "1"
+unicode-width = "0.2"
# System
libc = "0.2"
diff --git a/README.md b/README.md
index 18d7dc8..c3a0b6e 100644
--- a/README.md
+++ b/README.md
@@ -1,12 +1,16 @@
-
+
pilotty
+
+ The terminal equivalent of agent-browser
+
+
Terminal automation CLI for AI agents
- Like agent-browser, but for TUI applications.
+ Control vim, htop, lazygit, dialog, and any TUI programmatically
@@ -21,16 +25,23 @@
> [!NOTE]
> **Built with AI, for AI.** This project was built with the support of an AI agent, planned thoroughly with a tight feedback loop and reviewed at each step. While we've tested extensively, edge cases may exist. Use in production at your own discretion, and please [report any issues](https://github.com/msmps/pilotty/issues) you find!
-pilotty enables AI agents to interact with terminal applications (vim, htop, lazygit, dialog, etc.) through a simple CLI interface. It manages PTY sessions, captures terminal output, and provides keyboard/mouse input capabilities for navigating TUI applications.
+pilotty enables AI agents to interact with terminal applications through a simple command-line interface. It manages pseudo-terminal (PTY) sessions with full VT100 terminal emulation, captures screen state, and provides keyboard/mouse input for navigating terminal user interfaces. Think of it as headless terminal automation for AI workflows.
## Features
-- **PTY Management**: Spawn and manage terminal applications in background sessions
+- **PTY (Pseudo-Terminal) Management**: Spawn and manage terminal applications in background sessions
+- **Terminal Emulation**: Full VT100 emulation for accurate screen capture and state tracking
- **Keyboard Navigation**: Interact with TUIs using Tab, Enter, arrow keys, and key combos
- **AI-Friendly Output**: Clean JSON responses with actionable suggestions on errors
- **Multi-Session**: Run multiple terminal apps simultaneously in isolated sessions
- **Zero Config**: Daemon auto-starts on first command, auto-stops after 5 minutes idle
+## Why pilotty?
+
+[agent-browser](https://github.com/vercel-labs/agent-browser) by Vercel Labs lets AI agents control web browsers. pilotty does the same for terminals.
+
+**Origin story:** Built to solve a personal problem, pilotty was created to enable AI agents to interact with [OpenTUI](https://github.com/anomalyco/opentui) interfaces and control [OpenCode](https://github.com/anomalyco/opencode) programmatically. If you're building TUIs or working with terminal applications, pilotty lets AI navigate them just like a human would.
+
## Installation
### npm (recommended)
@@ -150,11 +161,82 @@ The `snapshot` command returns structured data about the terminal screen:
"snapshot_id": 42,
"size": { "cols": 80, "rows": 24 },
"cursor": { "row": 5, "col": 10, "visible": true },
- "text": "... plain text content ..."
+ "text": "Options: [x] Enable [ ] Debug\nActions: [OK] [Cancel]",
+ "elements": [
+ { "kind": "toggle", "row": 0, "col": 9, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
+ { "kind": "toggle", "row": 0, "col": 22, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
+ { "kind": "button", "row": 1, "col": 9, "width": 4, "text": "[OK]", "confidence": 0.8 },
+ { "kind": "button", "row": 1, "col": 14, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
+ ],
+ "content_hash": 12345678901234567890
}
```
-Use the cursor position and text content to understand the screen state and navigate using keyboard commands (Tab, Enter, arrow keys) or click at specific coordinates.
+## UI Elements (Contextual)
+
+pilotty automatically detects interactive UI elements in terminal applications. Elements provide **read-only context** to help understand UI structure, with position data (row, col) for use with the click command.
+
+**Use keyboard navigation (`pilotty key Tab`, `pilotty key Enter`, `pilotty type "text"`) for reliable TUI interaction** rather than element-based actions, as UI element detection depends on visual patterns that may disappear after interaction.
+
+### Element Kinds
+
+| Kind | Detection Patterns | Confidence |
+|------|-------------------|------------|
+| **button** | Inverse video, `[OK]`, `` | 1.0 / 0.8 |
+| **input** | Cursor position, `____` underscores | 1.0 / 0.6 |
+| **toggle** | `[x]`, `[ ]`, `☑`, `☐` | 1.0 |
+
+### Element Fields
+
+| Field | Description |
+|-------|-------------|
+| `kind` | Element type: `button`, `input`, or `toggle` |
+| `row` | Row position (0-based) |
+| `col` | Column position (0-based) |
+| `width` | Width in terminal cells |
+| `text` | Text content of the element |
+| `confidence` | Detection confidence (0.0-1.0) |
+| `focused` | Whether element has focus (only present if true) |
+| `checked` | Toggle state (only present for toggles) |
+
+### Change Detection
+
+The `content_hash` field enables screen change detection between snapshots:
+
+```bash
+# Get initial snapshot
+SNAP1=$(pilotty snapshot)
+HASH1=$(echo "$SNAP1" | jq -r '.content_hash')
+
+# Perform some action
+pilotty key Tab
+
+# Check if screen changed
+SNAP2=$(pilotty snapshot)
+HASH2=$(echo "$SNAP2" | jq -r '.content_hash')
+
+if [ "$HASH1" != "$HASH2" ]; then
+ echo "Screen content changed"
+fi
+```
+
+### Workflow Example
+
+```bash
+# 1. Spawn a TUI with dialog elements
+pilotty spawn dialog --yesno "Continue?" 10 40
+
+# 2. Wait for dialog to render
+pilotty wait-for "Continue"
+
+# 3. Get snapshot with elements (for context)
+pilotty snapshot | jq '.elements'
+# Shows detected buttons, helps understand UI structure
+
+# 4. Navigate and interact with keyboard (reliable approach)
+pilotty key Tab # Move to next element
+pilotty key Enter # Activate selected element
+```
## Sessions
diff --git a/crates/pilotty-cli/src/daemon/server.rs b/crates/pilotty-cli/src/daemon/server.rs
index 10493dd..c0fb344 100644
--- a/crates/pilotty-cli/src/daemon/server.rs
+++ b/crates/pilotty-cli/src/daemon/server.rs
@@ -615,15 +615,18 @@ async fn handle_snapshot(
Err(e) => return Response::error(request_id, e),
};
+ let format = format.unwrap_or(SnapshotFormat::Full);
+
+ // Full format includes UI element detection
+ let with_elements = matches!(format, SnapshotFormat::Full);
+
// Get snapshot data (drains PTY output first)
- let snapshot = match sessions.get_snapshot_data(&session_id).await {
+ let snapshot = match sessions.get_snapshot_data(&session_id, with_elements).await {
Ok(data) => data,
Err(e) => return Response::error(request_id, e),
};
let (cursor_row, cursor_col) = snapshot.cursor_pos;
- let format = format.unwrap_or(SnapshotFormat::Full);
-
match format {
SnapshotFormat::Text => {
// Format as plain text with cursor indicator
@@ -637,9 +640,10 @@ async fn handle_snapshot(
},
)
}
- SnapshotFormat::Full | SnapshotFormat::Compact => {
- // Build ScreenState JSON
+ SnapshotFormat::Full => {
+ // Full: text + elements + metadata + content_hash
let snapshot_id = sessions.next_snapshot_id();
+
let screen_state = ScreenState {
snapshot_id,
size: TerminalSize {
@@ -651,11 +655,29 @@ async fn handle_snapshot(
col: cursor_col,
visible: snapshot.cursor_visible,
},
- text: if format == SnapshotFormat::Full {
- Some(snapshot.text)
- } else {
- None
+ text: Some(snapshot.text),
+ elements: snapshot.elements,
+ content_hash: snapshot.content_hash,
+ };
+ Response::success(request_id, ResponseData::ScreenState(screen_state))
+ }
+ SnapshotFormat::Compact => {
+ // Compact: metadata only, no text, elements, or hash
+ let snapshot_id = sessions.next_snapshot_id();
+ let screen_state = ScreenState {
+ snapshot_id,
+ size: TerminalSize {
+ cols: snapshot.size.cols,
+ rows: snapshot.size.rows,
},
+ cursor: CursorState {
+ row: cursor_row,
+ col: cursor_col,
+ visible: snapshot.cursor_visible,
+ },
+ text: None,
+ elements: None,
+ content_hash: None,
};
Response::success(request_id, ResponseData::ScreenState(screen_state))
}
@@ -1016,16 +1038,20 @@ async fn handle_wait_for(
Err(e) => return Response::error(request_id, e),
};
- // Compile regex if needed
+ // Compile regex if needed.
+ // Limit compiled pattern size to prevent slow compilation.
let compiled_regex = if use_regex {
- match regex::Regex::new(&pattern) {
+ match regex::RegexBuilder::new(&pattern)
+ .size_limit(256 * 1024) // 256KB compiled size limit
+ .build()
+ {
Ok(r) => Some(r),
Err(e) => {
return Response::error(
request_id,
ApiError::invalid_input_with_suggestion(
format!("Invalid regex pattern: {}", e),
- "Check your regex syntax. Common issues: unescaped special chars, unbalanced parentheses.",
+ "Check your regex syntax. Common issues: unescaped special chars, unbalanced parentheses, or pattern too complex.",
),
);
}
@@ -1054,8 +1080,8 @@ async fn handle_wait_for(
);
}
- // Get current screen text
- let snapshot = match sessions.get_snapshot_data(&session_id).await {
+ // Get current screen text (no elements needed for wait_for)
+ let snapshot = match sessions.get_snapshot_data(&session_id, false).await {
Ok(data) => data,
Err(e) => return Response::error(request_id, e),
};
@@ -2347,4 +2373,173 @@ mod tests {
let _ = std::fs::remove_file(&socket_path);
let _ = std::fs::remove_file(&pid_path);
}
+
+ #[tokio::test]
+ async fn test_snapshot_with_elements() {
+ use pilotty_core::elements::ElementKind;
+
+ let temp_dir = std::env::temp_dir();
+ let socket_path = temp_dir.join(format!("pilotty-elem-{}.sock", std::process::id()));
+ let pid_path = socket_path.with_extension("pid");
+
+ let server = DaemonServer::bind_to(socket_path.clone(), pid_path.clone())
+ .await
+ .expect("Failed to bind server");
+
+ let server_handle = tokio::spawn(async move {
+ let _ = timeout(Duration::from_secs(5), server.run()).await;
+ });
+
+ tokio::time::sleep(Duration::from_millis(50)).await;
+
+ let stream = UnixStream::connect(&socket_path)
+ .await
+ .expect("Failed to connect");
+ let (reader, mut writer) = stream.into_split();
+ let mut reader = BufReader::new(reader);
+
+ // Spawn a session with output containing detectable elements:
+ // - [OK] and [Cancel] → Buttons (bracket pattern, confidence 0.8)
+ // - [x] and [ ] → Toggles (checkbox pattern, confidence 1.0)
+ let spawn_request = Request {
+ id: "spawn-elem".to_string(),
+ command: Command::Spawn {
+ command: vec![
+ "printf".to_string(),
+ "Options: [x] Enable [ ] Debug\nActions: [OK] [Cancel]\n".to_string(),
+ ],
+ session_name: Some("elem-test".to_string()),
+ cwd: None,
+ },
+ };
+ let request_json = serde_json::to_string(&spawn_request).unwrap();
+ writer
+ .write_all(request_json.as_bytes())
+ .await
+ .expect("write");
+ writer.write_all(b"\n").await.expect("newline");
+ writer.flush().await.expect("flush");
+
+ let mut response_line = String::new();
+ timeout(Duration::from_secs(2), reader.read_line(&mut response_line))
+ .await
+ .expect("timeout")
+ .expect("read");
+
+ // Give printf time to complete
+ tokio::time::sleep(Duration::from_millis(200)).await;
+
+ // Request snapshot with Full format (includes elements)
+ let snap_request = Request {
+ id: "snap-elem".to_string(),
+ command: Command::Snapshot {
+ session: Some("elem-test".to_string()),
+ format: Some(SnapshotFormat::Full),
+ },
+ };
+ let snap_json = serde_json::to_string(&snap_request).unwrap();
+ writer.write_all(snap_json.as_bytes()).await.expect("write");
+ writer.write_all(b"\n").await.expect("newline");
+ writer.flush().await.expect("flush");
+
+ response_line.clear();
+ timeout(Duration::from_secs(2), reader.read_line(&mut response_line))
+ .await
+ .expect("timeout")
+ .expect("read");
+
+ let snap_response: Response =
+ serde_json::from_str(&response_line).expect("parse snap response");
+ assert!(snap_response.success, "Snapshot should succeed");
+
+ // Verify ScreenState with elements
+ if let Some(ResponseData::ScreenState(screen_state)) = snap_response.data {
+ // Full format includes text
+ assert!(
+ screen_state.text.is_some(),
+ "Full format should include text"
+ );
+
+ // Full format SHOULD include elements
+ assert!(
+ screen_state.elements.is_some(),
+ "Full format should include elements"
+ );
+
+ // Full format SHOULD include content_hash
+ assert!(
+ screen_state.content_hash.is_some(),
+ "Full format should include content_hash"
+ );
+
+ let elements = screen_state.elements.unwrap();
+
+ // Should detect at least the toggles (checkboxes are high confidence)
+ // [x] -> Toggle checked=true, [ ] -> Toggle checked=false
+ let toggles: Vec<_> = elements
+ .iter()
+ .filter(|e| e.kind == ElementKind::Toggle)
+ .collect();
+ assert!(
+ toggles.len() >= 2,
+ "Should detect at least 2 toggles, found {}",
+ toggles.len()
+ );
+
+ // Verify toggle states
+ let checked_toggle = toggles.iter().find(|t| t.checked == Some(true));
+ let unchecked_toggle = toggles.iter().find(|t| t.checked == Some(false));
+ assert!(
+ checked_toggle.is_some(),
+ "Should have a checked toggle ([x])"
+ );
+ assert!(
+ unchecked_toggle.is_some(),
+ "Should have an unchecked toggle ([ ])"
+ );
+
+ // Check toggle confidence is 1.0 (checkbox pattern)
+ for toggle in &toggles {
+ assert!(
+ (toggle.confidence - 1.0).abs() < f32::EPSILON,
+ "Toggle confidence should be 1.0, got {}",
+ toggle.confidence
+ );
+ }
+
+ // May also detect [OK] and [Cancel] as buttons
+ let buttons: Vec<_> = elements
+ .iter()
+ .filter(|e| e.kind == ElementKind::Button)
+ .collect();
+ // Buttons have 0.8 confidence (bracket pattern)
+ for button in &buttons {
+ assert!(
+ (button.confidence - 0.8).abs() < f32::EPSILON,
+ "Button confidence should be 0.8, got {}",
+ button.confidence
+ );
+ }
+
+ // Verify JSON serialization is clean (check raw response)
+ // - Non-focused elements should NOT have "focused" in their JSON
+ // - Buttons should NOT have "checked" in their JSON
+ let raw_json = &response_line;
+ // Count occurrences of "focused" - should only appear for focused elements
+ let focused_count = raw_json.matches("\"focused\"").count();
+ let elements_with_focus = elements.iter().filter(|e| e.focused).count();
+ assert_eq!(
+ focused_count, elements_with_focus,
+ "JSON should only include 'focused' for focused elements"
+ );
+ } else {
+ panic!(
+ "Expected ScreenState response data, got: {:?}",
+ snap_response.data
+ );
+ }
+
+ server_handle.abort();
+ let _ = std::fs::remove_file(&socket_path);
+ }
}
diff --git a/crates/pilotty-cli/src/daemon/session.rs b/crates/pilotty-cli/src/daemon/session.rs
index 9edcdae..8039be0 100644
--- a/crates/pilotty-cli/src/daemon/session.rs
+++ b/crates/pilotty-cli/src/daemon/session.rs
@@ -9,8 +9,11 @@ use chrono::{DateTime, Utc};
use tokio::sync::{Mutex, RwLock};
use tracing::{debug, info};
+use pilotty_core::elements::classify::{detect, ClassifyContext};
+use pilotty_core::elements::Element;
use pilotty_core::error::ApiError;
use pilotty_core::protocol::SessionInfo;
+use pilotty_core::snapshot::compute_content_hash;
use crate::daemon::pty::{AsyncPtyHandle, PtySession, TermSize};
use crate::daemon::terminal::TerminalEmulator;
@@ -56,6 +59,11 @@ pub struct SnapshotData {
pub cursor_pos: (u16, u16),
pub cursor_visible: bool,
pub size: TermSize,
+ /// Detected UI elements (computed on demand).
+ pub elements: Option>,
+ /// Hash of screen content for change detection.
+ /// Present when `with_elements=true`.
+ pub content_hash: Option,
}
/// An active PTY session.
@@ -88,21 +96,6 @@ impl Session {
}
}
- /// Get the plain text content of the terminal screen.
- pub async fn get_text(&self) -> String {
- self.terminal.lock().await.get_text()
- }
-
- /// Get the cursor position (row, col) - 0-indexed.
- pub async fn cursor_position(&self) -> (u16, u16) {
- self.terminal.lock().await.cursor_position()
- }
-
- /// Check if the cursor is visible.
- pub async fn cursor_visible(&self) -> bool {
- self.terminal.lock().await.cursor_visible()
- }
-
/// Check if terminal is in application cursor mode.
pub async fn application_cursor(&self) -> bool {
self.terminal.lock().await.application_cursor()
@@ -382,7 +375,14 @@ impl SessionManager {
///
/// Uses a read lock on sessions since all operations use interior mutability,
/// avoiding potential deadlocks from holding a write lock during I/O.
- pub async fn get_snapshot_data(&self, id: &SessionId) -> Result {
+ ///
+ /// If `with_elements` is true, element detection runs to identify
+ /// UI elements like buttons, checkboxes, and menu items.
+ pub async fn get_snapshot_data(
+ &self,
+ id: &SessionId,
+ with_elements: bool,
+ ) -> Result {
let sessions = self.sessions.read().await;
let session = sessions
.get(id)
@@ -391,17 +391,33 @@ impl SessionManager {
// Drain pending PTY output to update terminal state
session.drain_pty_output().await;
+ // Lock terminal once for all reads
+ let terminal = session.terminal.lock().await;
+
// Get snapshot data
- let text = session.get_text().await;
- let cursor_pos = session.cursor_position().await;
- let cursor_visible = session.cursor_visible().await;
+ let text = terminal.get_text();
+ let cursor_pos = terminal.cursor_position();
+ let cursor_visible = terminal.cursor_visible();
let size = session.size;
+ // Detect UI elements and compute content hash if requested
+ let (elements, content_hash) = if with_elements {
+ let (cursor_row, cursor_col) = cursor_pos;
+ let ctx = ClassifyContext::new().with_cursor(cursor_row, cursor_col);
+ let elems = detect(&*terminal, &ctx);
+ let hash = compute_content_hash(&text);
+ (Some(elems), Some(hash))
+ } else {
+ (None, None)
+ };
+
Ok(SnapshotData {
text,
cursor_pos,
cursor_visible,
size,
+ elements,
+ content_hash,
})
}
diff --git a/crates/pilotty-cli/src/daemon/terminal.rs b/crates/pilotty-cli/src/daemon/terminal.rs
index af84669..f925c02 100644
--- a/crates/pilotty-cli/src/daemon/terminal.rs
+++ b/crates/pilotty-cli/src/daemon/terminal.rs
@@ -4,6 +4,8 @@
//! that can parse ANSI escape sequences from PTY output.
use crate::daemon::pty::TermSize;
+use pilotty_core::elements::grid::{ScreenCell, ScreenGrid};
+use pilotty_core::elements::style::{CellStyle, Color};
/// Terminal emulator that parses ANSI escape sequences.
///
@@ -91,6 +93,49 @@ impl TerminalEmulator {
}
}
+/// Convert vt100 color to core Color type.
+fn convert_color(vt_color: vt100::Color) -> Color {
+ match vt_color {
+ vt100::Color::Default => Color::Default,
+ vt100::Color::Idx(idx) => Color::Indexed { index: idx },
+ vt100::Color::Rgb(r, g, b) => Color::Rgb { r, g, b },
+ }
+}
+
+/// Convert vt100 cell to core ScreenCell.
+fn convert_cell(vt_cell: &vt100::Cell) -> ScreenCell {
+ // Get the character from the cell contents
+ // vt100::Cell::contents() returns a String (may be empty for wide char continuations)
+ let contents = vt_cell.contents();
+ let ch = contents.chars().next().unwrap_or(' ');
+
+ let style = CellStyle {
+ bold: vt_cell.bold(),
+ underline: vt_cell.underline(),
+ inverse: vt_cell.inverse(),
+ fg_color: convert_color(vt_cell.fgcolor()),
+ bg_color: convert_color(vt_cell.bgcolor()),
+ };
+
+ ScreenCell::new(ch, style)
+}
+
+impl ScreenGrid for TerminalEmulator {
+ fn rows(&self) -> u16 {
+ let (rows, _cols) = self.parser.screen().size();
+ rows
+ }
+
+ fn cols(&self) -> u16 {
+ let (_rows, cols) = self.parser.screen().size();
+ cols
+ }
+
+ fn cell(&self, row: u16, col: u16) -> Option {
+ self.parser.screen().cell(row, col).map(convert_cell)
+ }
+}
+
#[cfg(test)]
mod tests {
use super::*;
@@ -490,4 +535,130 @@ mod tests {
"Should be normal mode after ESC[?1l"
);
}
+
+ // ScreenGrid implementation tests
+
+ #[test]
+ fn test_screen_grid_dimensions() {
+ let term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+
+ assert_eq!(ScreenGrid::rows(&term), 24);
+ assert_eq!(ScreenGrid::cols(&term), 80);
+ }
+
+ #[test]
+ fn test_screen_grid_cell_access() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ term.feed(b"Hello");
+
+ // Check cells with content via ScreenGrid trait
+ let cell_h = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert_eq!(cell_h.ch, 'H');
+
+ let cell_o = ScreenGrid::cell(&term, 0, 4).expect("Cell should exist");
+ assert_eq!(cell_o.ch, 'o');
+
+ // Check empty cell
+ let cell_empty = ScreenGrid::cell(&term, 0, 10).expect("Cell should exist");
+ assert_eq!(cell_empty.ch, ' ');
+ }
+
+ #[test]
+ fn test_screen_grid_out_of_bounds() {
+ let term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+
+ assert!(ScreenGrid::cell(&term, 0, 0).is_some());
+ assert!(ScreenGrid::cell(&term, 23, 79).is_some());
+ assert!(ScreenGrid::cell(&term, 24, 0).is_none()); // row out of bounds
+ assert!(ScreenGrid::cell(&term, 0, 80).is_none()); // col out of bounds
+ }
+
+ #[test]
+ fn test_screen_grid_color_mapping_default() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ term.feed(b"A");
+
+ let cell = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert_eq!(cell.style.fg_color, Color::Default);
+ assert_eq!(cell.style.bg_color, Color::Default);
+ }
+
+ #[test]
+ fn test_screen_grid_color_mapping_indexed() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ // Red foreground (color 1), blue background (color 4)
+ term.feed(b"\x1b[31;44mX");
+
+ let cell = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert_eq!(cell.style.fg_color, Color::Indexed { index: 1 });
+ assert_eq!(cell.style.bg_color, Color::Indexed { index: 4 });
+ }
+
+ #[test]
+ fn test_screen_grid_color_mapping_rgb() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ // 24-bit RGB: ESC[38;2;255;128;64m for fg, ESC[48;2;0;0;0m for bg
+ term.feed(b"\x1b[38;2;255;128;64mR");
+
+ let cell = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert_eq!(
+ cell.style.fg_color,
+ Color::Rgb {
+ r: 255,
+ g: 128,
+ b: 64
+ }
+ );
+ }
+
+ #[test]
+ fn test_screen_grid_style_bold() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ term.feed(b"N\x1b[1mB\x1b[0m");
+
+ let normal = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert!(!normal.style.bold);
+
+ let bold = ScreenGrid::cell(&term, 0, 1).expect("Cell should exist");
+ assert!(bold.style.bold);
+ }
+
+ #[test]
+ fn test_screen_grid_style_underline() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ term.feed(b"N\x1b[4mU\x1b[0m");
+
+ let normal = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert!(!normal.style.underline);
+
+ let underlined = ScreenGrid::cell(&term, 0, 1).expect("Cell should exist");
+ assert!(underlined.style.underline);
+ }
+
+ #[test]
+ fn test_screen_grid_style_inverse() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ // \x1b[7m = inverse on
+ term.feed(b"N\x1b[7mI\x1b[0m");
+
+ let normal = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert!(!normal.style.inverse);
+
+ let inverse = ScreenGrid::cell(&term, 0, 1).expect("Cell should exist");
+ assert!(inverse.style.inverse);
+ }
+
+ #[test]
+ fn test_screen_grid_combined_styles() {
+ let mut term = TerminalEmulator::new(TermSize { cols: 80, rows: 24 });
+ // Bold + underline + inverse + red fg + blue bg
+ term.feed(b"\x1b[1;4;7;31;44mS");
+
+ let cell = ScreenGrid::cell(&term, 0, 0).expect("Cell should exist");
+ assert!(cell.style.bold);
+ assert!(cell.style.underline);
+ assert!(cell.style.inverse);
+ assert_eq!(cell.style.fg_color, Color::Indexed { index: 1 });
+ assert_eq!(cell.style.bg_color, Color::Indexed { index: 4 });
+ }
}
diff --git a/crates/pilotty-core/Cargo.toml b/crates/pilotty-core/Cargo.toml
index 4178ed6..5e82a1a 100644
--- a/crates/pilotty-core/Cargo.toml
+++ b/crates/pilotty-core/Cargo.toml
@@ -8,3 +8,4 @@ description = "Core types and logic for pilotty"
[dependencies]
serde = { workspace = true }
serde_json = { workspace = true }
+unicode-width = { workspace = true }
diff --git a/crates/pilotty-core/src/elements/classify.rs b/crates/pilotty-core/src/elements/classify.rs
new file mode 100644
index 0000000..4f938d1
--- /dev/null
+++ b/crates/pilotty-core/src/elements/classify.rs
@@ -0,0 +1,853 @@
+//! Classification: converting clusters into interactive elements.
+//!
+//! The classifier applies priority-ordered rules to determine each cluster's
+//! kind. Only interactive elements (Button, Input, Toggle) are returned;
+//! non-interactive content stays in `snapshot.text`.
+//!
+//! # Rule Priority (highest to lowest)
+//!
+//! 1. Cursor position → Input (confidence: 1.0, focused: true)
+//! 2. Checkbox patterns `[x]`, `[ ]`, `☑`, `☐` → Toggle (confidence: 1.0)
+//! 3. Inverse video → Button (confidence: 1.0, focused: true)
+//! 4. Bracket patterns `[OK]`, `` → Button (confidence: 0.8)
+//! 5. Underscore field `____` → Input (confidence: 0.6)
+//!
+//! Non-interactive patterns (links, progress bars, errors, status indicators,
+//! box-drawing, menu prefixes, static text) are filtered out.
+
+use unicode_width::UnicodeWidthStr;
+
+use crate::elements::segment::Cluster;
+use crate::elements::{Element, ElementKind};
+
+// ============================================================================
+// Constants
+// ============================================================================
+
+/// Maximum cluster text length to process for tokenization.
+/// Protects against memory exhaustion from malicious terminal output.
+/// Terminal lines rarely exceed this; longer text won't contain meaningful UI elements.
+const MAX_CLUSTER_TEXT_LEN: usize = 4096;
+
+// ============================================================================
+// Token Extraction
+// ============================================================================
+
+/// A token extracted from a cluster's text.
+///
+/// Tokens are sub-patterns within a cluster that match interactive elements:
+/// - Bracketed tokens: `[OK]`, ``, `[ ]`, `[x]`
+/// - Underscore runs: `____`, `__________`
+#[derive(Debug, Clone, PartialEq, Eq)]
+struct Token {
+ /// Text content of the token.
+ text: String,
+ /// Byte offset from start of cluster text (used to slice prefix for width calculation).
+ byte_offset: usize,
+}
+
+/// Calculate the display-width column offset for a token within cluster text.
+///
+/// This handles CJK characters correctly (width 2) by computing the display
+/// width of the text prefix before the token.
+fn token_col_offset(text: &str, byte_offset: usize) -> u16 {
+ text.get(..byte_offset)
+ .map(|prefix| prefix.width().min(u16::MAX as usize) as u16)
+ .unwrap_or(0)
+}
+
+/// Extract bracketed tokens from text.
+///
+/// Finds patterns like `[OK]`, ``, `(Submit)`, `[ ]`, `[x]`.
+/// Returns tokens with their byte offsets within the text (for display width calculation).
+///
+/// Returns empty if text exceeds MAX_CLUSTER_TEXT_LEN to prevent memory exhaustion.
+fn extract_bracketed_tokens(text: &str) -> Vec {
+ // Protect against memory exhaustion from extremely long input
+ if text.len() > MAX_CLUSTER_TEXT_LEN {
+ return Vec::new();
+ }
+
+ let mut tokens = Vec::new();
+
+ for (char_idx, ch) in text.char_indices() {
+ // Look for opening brackets
+ let close_bracket = match ch {
+ '[' => Some(']'),
+ '<' => Some('>'),
+ '(' => Some(')'),
+ '【' => Some('】'),
+ '「' => Some('」'),
+ _ => None,
+ };
+
+ if let Some(closer) = close_bracket {
+ // Find matching closer in the remainder of the string
+ if let Some(end_rel) = text[char_idx + ch.len_utf8()..].find(closer) {
+ let token_start = char_idx;
+ let token_end = char_idx + ch.len_utf8() + end_rel + closer.len_utf8();
+ let token_text = &text[token_start..token_end];
+
+ // Only extract if it looks interactive (not just empty or single char)
+ if token_text.chars().count() >= 3 || is_unicode_checkbox(token_text) {
+ tokens.push(Token {
+ text: token_text.to_string(),
+ byte_offset: token_start,
+ });
+ }
+ }
+ }
+ }
+
+ // Deduplicate overlapping tokens by keeping only non-overlapping ones
+ let mut result = Vec::new();
+ let mut last_end = 0;
+ for token in tokens {
+ if token.byte_offset >= last_end {
+ last_end = token.byte_offset + token.text.len();
+ result.push(token);
+ }
+ }
+
+ result
+}
+
+/// Check if text is a single unicode checkbox character.
+fn is_unicode_checkbox(text: &str) -> bool {
+ matches!(text, "☑" | "☐" | "□" | "✓" | "✔" | "☒")
+}
+
+/// Extract underscore runs from text.
+///
+/// Finds patterns like `____`, `__________` (3+ underscores).
+/// Returns tokens with their byte offsets within the text (for display width calculation).
+///
+/// Returns empty if text exceeds MAX_CLUSTER_TEXT_LEN to prevent memory exhaustion.
+fn extract_underscore_runs(text: &str) -> Vec {
+ // Protect against memory exhaustion from extremely long input
+ if text.len() > MAX_CLUSTER_TEXT_LEN {
+ return Vec::new();
+ }
+
+ let mut tokens = Vec::new();
+ let mut in_run = false;
+ let mut run_start = 0;
+
+ for (byte_idx, ch) in text.char_indices() {
+ if ch == '_' {
+ if !in_run {
+ in_run = true;
+ run_start = byte_idx;
+ }
+ } else if in_run {
+ // End of underscore run
+ let run_text = &text[run_start..byte_idx];
+ if run_text.len() >= 3 {
+ tokens.push(Token {
+ text: run_text.to_string(),
+ byte_offset: run_start,
+ });
+ }
+ in_run = false;
+ }
+ }
+
+ // Handle run at end of string
+ if in_run {
+ let run_text = &text[run_start..];
+ if run_text.len() >= 3 {
+ tokens.push(Token {
+ text: run_text.to_string(),
+ byte_offset: run_start,
+ });
+ }
+ }
+
+ tokens
+}
+
+/// Context for classification decisions that depend on screen position.
+#[derive(Debug, Clone, Copy, Default)]
+pub struct ClassifyContext {
+ /// Optional cursor row (if known). Clusters at cursor position become Input.
+ pub cursor_row: Option,
+ /// Optional cursor column (if known).
+ pub cursor_col: Option,
+}
+
+impl ClassifyContext {
+ /// Create a new context with no cursor information.
+ #[must_use]
+ pub fn new() -> Self {
+ Self::default()
+ }
+
+ /// Set cursor position.
+ #[must_use]
+ pub fn with_cursor(mut self, row: u16, col: u16) -> Self {
+ self.cursor_row = Some(row);
+ self.cursor_col = Some(col);
+ self
+ }
+}
+
+/// Internal element data during classification.
+///
+/// Used during classification to collect elements before converting
+/// to the public Element type.
+#[derive(Debug, Clone)]
+struct DetectedElement {
+ kind: ElementKind,
+ row: u16,
+ col: u16,
+ width: u16,
+ text: String,
+ confidence: f32,
+ checked: Option,
+ focused: bool,
+}
+
+impl DetectedElement {
+ /// Create a button element.
+ fn button(row: u16, col: u16, text: String, confidence: f32, focused: bool) -> Self {
+ Self {
+ kind: ElementKind::Button,
+ row,
+ col,
+ width: text.width().min(u16::MAX as usize) as u16,
+ text,
+ confidence,
+ checked: None,
+ focused,
+ }
+ }
+
+ /// Create an input element.
+ fn input(row: u16, col: u16, text: String, confidence: f32, focused: bool) -> Self {
+ Self {
+ kind: ElementKind::Input,
+ row,
+ col,
+ width: text.width().min(u16::MAX as usize) as u16,
+ text,
+ confidence,
+ checked: None,
+ focused,
+ }
+ }
+
+ /// Create a toggle element.
+ fn toggle(row: u16, col: u16, text: String, checked: bool) -> Self {
+ Self {
+ kind: ElementKind::Toggle,
+ row,
+ col,
+ width: text.width().min(u16::MAX as usize) as u16,
+ text,
+ confidence: 1.0,
+ checked: Some(checked),
+ focused: false,
+ }
+ }
+
+ /// Convert to Element.
+ fn into_element(self) -> Element {
+ let mut elem = Element::new(
+ self.kind,
+ self.row,
+ self.col,
+ self.width,
+ self.text,
+ self.confidence,
+ );
+ if let Some(checked) = self.checked {
+ elem = elem.with_checked(checked);
+ }
+ if self.focused {
+ elem = elem.with_focused(true);
+ }
+ elem
+ }
+}
+
+// ============================================================================
+// Pattern Detection Helpers
+// ============================================================================
+
+/// Check if text matches a single button bracket pattern: `[OK]`, ``, `(Confirm)`
+///
+/// Requires:
+/// - Exactly one pair of matching brackets
+/// - At least one non-bracket character inside
+/// - No brackets in the interior (to reject `[Yes] [No]`)
+fn is_button_pattern(text: &str) -> bool {
+ let trimmed = text.trim();
+ if trimmed.len() < 3 {
+ return false;
+ }
+
+ let chars: Vec = trimmed.chars().collect();
+ let first = chars[0];
+ let last = chars[chars.len() - 1];
+
+ // Check for matching bracket pairs
+ let (opener, closer) = match (first, last) {
+ ('[', ']') => ('[', ']'),
+ ('<', '>') => ('<', '>'),
+ ('(', ')') => ('(', ')'),
+ ('【', '】') => ('【', '】'),
+ ('「', '」') => ('「', '」'),
+ _ => return false,
+ };
+
+ // Interior must have non-whitespace content (not just empty brackets)
+ let interior: String = chars[1..chars.len() - 1].iter().collect();
+
+ // Reject if interior contains more brackets (e.g., "[Yes] [No]")
+ if interior.contains(opener) || interior.contains(closer) {
+ return false;
+ }
+
+ // Reject if it looks like a checkbox pattern
+ if is_checkbox_content(&interior) {
+ return false;
+ }
+
+ // Reject if it looks like a progress bar inside brackets
+ if is_progress_bar_content(&interior) {
+ return false;
+ }
+
+ // Must have actual label content
+ !interior.trim().is_empty()
+}
+
+/// Helper to check if content inside brackets looks like progress bar content.
+fn is_progress_bar_content(content: &str) -> bool {
+ if content.is_empty() {
+ return false;
+ }
+
+ // Count progress-bar typical characters
+ let progress_chars: usize = content
+ .chars()
+ .filter(|&c| matches!(c, '=' | '>' | '-' | '#' | ' ' | '█' | '░'))
+ .count();
+
+ // If more than 80% of chars are progress-like, it's probably a progress bar
+ progress_chars * 10 >= content.len() * 8
+}
+
+/// Check if text matches checkbox patterns.
+///
+/// Supported patterns:
+/// - `[x]`, `[X]`, `[ ]` - ASCII checkboxes
+/// - `[*]`, `[-]` - Alternative markers
+/// - `☑`, `☐`, `✓`, `✗` - Unicode checkboxes
+/// - `(x)`, `( )`, `(*)` - Parenthesized variants
+fn is_checkbox_pattern(text: &str) -> Option {
+ let trimmed = text.trim();
+
+ // Single character unicode checkboxes
+ match trimmed {
+ "☑" | "✓" | "✔" | "☒" => return Some(true),
+ "☐" | "□" => return Some(false),
+ _ => {}
+ }
+
+ // Bracketed checkboxes: [x], [ ], [*], [-], etc.
+ if trimmed.len() == 3 {
+ let chars: Vec = trimmed.chars().collect();
+ if (chars[0] == '[' && chars[2] == ']') || (chars[0] == '(' && chars[2] == ')') {
+ return match chars[1] {
+ 'x' | 'X' | '*' | '✓' | '✔' => Some(true),
+ ' ' | '.' => Some(false),
+ '-' => Some(false), // indeterminate treated as unchecked
+ _ => None,
+ };
+ }
+ }
+
+ None
+}
+
+/// Helper to check if content inside brackets looks like checkbox content.
+fn is_checkbox_content(content: &str) -> bool {
+ let trimmed = content.trim();
+ matches!(trimmed, "x" | "X" | " " | "*" | "-" | "✓" | "✔")
+}
+
+/// Check if text looks like an input field placeholder.
+///
+/// Patterns: `____`, `[ ]`, `: _____`
+fn is_input_pattern(text: &str) -> bool {
+ let trimmed = text.trim();
+
+ // Series of underscores
+ if trimmed.chars().all(|c| c == '_') && trimmed.len() >= 3 {
+ return true;
+ }
+
+ // Empty bracketed field with mostly spaces
+ if trimmed.starts_with('[') && trimmed.ends_with(']') && trimmed.len() >= 4 {
+ let inner: String = trimmed.chars().skip(1).take(trimmed.len() - 2).collect();
+ if inner.trim().is_empty() && inner.len() >= 2 {
+ return true;
+ }
+ }
+
+ // Colon followed by underscores: "Name: ___"
+ if let Some(colon_pos) = trimmed.find(':') {
+ let after_colon = trimmed[colon_pos + 1..].trim_start();
+ if after_colon.chars().all(|c| c == '_') && after_colon.len() >= 3 {
+ return true;
+ }
+ }
+
+ false
+}
+
+// ============================================================================
+// Core Classification
+// ============================================================================
+
+/// Classify a text pattern into a detected element at the given position.
+///
+/// This is the low-level classifier that doesn't consider tokenization.
+/// Returns `None` for non-interactive patterns.
+///
+/// Classification priority:
+/// 1. Checkbox patterns → Toggle (state is unambiguous)
+/// 2. Inverse video → Button (focused) - TUI convention for selection
+/// 3. Bracket patterns → Button (with focus if cursor present)
+/// 4. Underscore/labeled fields → Input (with focus if cursor present)
+/// 5. Cursor on unrecognized text → Input (fallback for editable regions)
+fn classify_text(
+ text: &str,
+ row: u16,
+ col: u16,
+ is_inverse: bool,
+ cursor_in_range: bool,
+) -> Option {
+ // Rule 1: Checkbox patterns → Toggle
+ // Checkboxes have unambiguous visual state, highest confidence
+ if let Some(checked) = is_checkbox_pattern(text) {
+ return Some(DetectedElement::toggle(row, col, text.to_string(), checked));
+ }
+
+ // Rule 2: Inverse video → Button (focused)
+ // TUI convention: inverse video = selected/focused item
+ if is_inverse {
+ return Some(DetectedElement::button(
+ row,
+ col,
+ text.to_string(),
+ 1.0,
+ true,
+ ));
+ }
+
+ // Rule 3: Bracket patterns → Button
+ // Cursor on button makes it focused, not an input
+ if is_button_pattern(text) {
+ return Some(DetectedElement::button(
+ row,
+ col,
+ text.to_string(),
+ if cursor_in_range { 1.0 } else { 0.8 },
+ cursor_in_range,
+ ));
+ }
+
+ // Rule 4: Underscore field → Input
+ if is_input_pattern(text) {
+ return Some(DetectedElement::input(
+ row,
+ col,
+ text.to_string(),
+ if cursor_in_range { 1.0 } else { 0.6 },
+ cursor_in_range,
+ ));
+ }
+
+ // Rule 5: Cursor on unrecognized pattern → Input (fallback)
+ // If cursor is here and we don't know what it is, assume editable
+ if cursor_in_range {
+ return Some(DetectedElement::input(
+ row,
+ col,
+ text.to_string(),
+ 1.0,
+ true,
+ ));
+ }
+
+ None
+}
+
+/// Check if cursor is within a range.
+///
+/// Uses saturating arithmetic to prevent overflow when col + width exceeds u16::MAX.
+fn cursor_in_range(ctx: &ClassifyContext, row: u16, col: u16, width: u16) -> bool {
+ if let (Some(cursor_row), Some(cursor_col)) = (ctx.cursor_row, ctx.cursor_col) {
+ cursor_row == row && cursor_col >= col && cursor_col < col.saturating_add(width)
+ } else {
+ false
+ }
+}
+
+/// Extract elements from a cluster using tokenization.
+///
+/// If the cluster contains bracketed tokens or underscore runs, those are
+/// extracted as separate elements. The parent cluster is dropped if tokens
+/// are found (tokens win, inherit parent's focus if inverse).
+///
+/// This handles cases like:
+/// - `"Save [OK] Cancel"` → extracts `[OK]` as Button
+/// - `"Name: ____"` → extracts `____` as Input
+fn extract_elements_from_cluster(cluster: &Cluster, ctx: &ClassifyContext) -> Vec {
+ let row = cluster.row;
+ let col = cluster.col;
+ let text = &cluster.text;
+ let is_inverse = cluster.style.is_inverse();
+
+ // First, try to classify the whole cluster
+ let cursor_hit = cursor_in_range(ctx, row, col, cluster.width);
+ let whole_cluster_elem = classify_text(text, row, col, is_inverse, cursor_hit);
+
+ // Check if the whole cluster is already a "tight" interactive pattern
+ // (checkbox, bracketed button, or underscore-only input)
+ if let Some(ref elem) = whole_cluster_elem {
+ // If it's a toggle (checkbox pattern), return immediately
+ if elem.kind == ElementKind::Toggle {
+ return vec![elem.clone()];
+ }
+
+ // If it's a bracket button and the text is entirely the bracket pattern
+ if elem.kind == ElementKind::Button && is_button_pattern(text) {
+ return vec![elem.clone()];
+ }
+
+ // If it's an input and the text is entirely underscores
+ if elem.kind == ElementKind::Input && text.trim().chars().all(|c| c == '_') {
+ return vec![elem.clone()];
+ }
+ }
+
+ // Try to extract tokens from within the cluster
+ let mut elements = Vec::new();
+ let parent_focused = is_inverse; // Tokens inherit focus from inverse parent
+
+ // Extract bracketed tokens
+ for token in extract_bracketed_tokens(text) {
+ let token_col = col + token_col_offset(text, token.byte_offset);
+ let token_cursor_hit = cursor_in_range(ctx, row, token_col, token.text.width() as u16);
+
+ // Classify the token text
+ // Note: tokens extracted from inverse clusters inherit focus
+ if let Some(mut elem) = classify_text(&token.text, row, token_col, false, token_cursor_hit)
+ {
+ if parent_focused && !elem.focused {
+ elem.focused = true;
+ // Upgrade confidence if inheriting focus
+ if elem.confidence < 1.0 {
+ elem.confidence = 1.0;
+ }
+ }
+ elements.push(elem);
+ }
+ }
+
+ // Extract underscore runs (only if no bracketed tokens found)
+ if elements.is_empty() {
+ for token in extract_underscore_runs(text) {
+ let token_col = col + token_col_offset(text, token.byte_offset);
+ let token_cursor_hit = cursor_in_range(ctx, row, token_col, token.text.width() as u16);
+
+ if let Some(mut elem) =
+ classify_text(&token.text, row, token_col, false, token_cursor_hit)
+ {
+ if parent_focused && !elem.focused {
+ elem.focused = true;
+ elem.confidence = 1.0;
+ }
+ elements.push(elem);
+ }
+ }
+ }
+
+ // If tokens were found, return them (dedup rule: tokens win)
+ if !elements.is_empty() {
+ return elements;
+ }
+
+ // No tokens found, return whole cluster classification if any
+ whole_cluster_elem.into_iter().collect()
+}
+
+/// Classify clusters into interactive elements.
+///
+/// Uses tokenization to extract sub-elements from clusters. If a cluster
+/// contains bracketed tokens or underscore runs, those are extracted as
+/// separate elements and the parent cluster is dropped (dedup rule).
+///
+/// Only returns interactive elements (Button, Input, Toggle).
+/// Non-interactive clusters are filtered out.
+///
+/// Elements are sorted by position (row, then col) for consistent ordering.
+#[must_use]
+pub fn classify(clusters: Vec, ctx: &ClassifyContext) -> Vec {
+ let mut detected: Vec = Vec::new();
+
+ for cluster in clusters {
+ detected.extend(extract_elements_from_cluster(&cluster, ctx));
+ }
+
+ // Sort by position (row, then col) for consistent ordering
+ detected.sort_by(|a, b| (a.row, a.col).cmp(&(b.row, b.col)));
+
+ // Convert to Elements
+ detected
+ .into_iter()
+ .map(|elem| elem.into_element())
+ .collect()
+}
+
+/// Convenience function: segment a grid and classify in one step.
+///
+/// This is the main entry point for element detection.
+#[must_use]
+pub fn detect(
+ grid: &G,
+ ctx: &ClassifyContext,
+) -> Vec {
+ let clusters = crate::elements::segment::segment(grid);
+ classify(clusters, ctx)
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+ use crate::elements::grid::test_support::SimpleGrid;
+ use crate::elements::segment::Cluster;
+ use crate::elements::style::CellStyle;
+
+ fn cluster(text: &str) -> Cluster {
+ Cluster::new(0, 0, text.to_string(), CellStyle::default())
+ }
+
+ fn cluster_at(row: u16, col: u16, text: &str) -> Cluster {
+ Cluster::new(row, col, text.to_string(), CellStyle::default())
+ }
+
+ fn inverse_cluster(text: &str) -> Cluster {
+ Cluster::new(0, 0, text.to_string(), CellStyle::new().with_inverse(true))
+ }
+
+ fn classify_cluster(cluster: &Cluster, ctx: &ClassifyContext) -> Option {
+ extract_elements_from_cluster(cluster, ctx)
+ .into_iter()
+ .next()
+ }
+
+ #[test]
+ fn button_bracket_patterns() {
+ let ctx = ClassifyContext::new();
+
+ let result = classify_cluster(&cluster("[OK]"), &ctx).unwrap();
+ assert_eq!(result.kind, ElementKind::Button);
+ assert!((result.confidence - 0.8).abs() < f32::EPSILON);
+
+ assert_eq!(
+ classify_cluster(&cluster(""), &ctx).unwrap().kind,
+ ElementKind::Button
+ );
+ assert_eq!(
+ classify_cluster(&cluster("(Submit)"), &ctx).unwrap().kind,
+ ElementKind::Button
+ );
+ }
+
+ #[test]
+ fn toggle_checkbox_patterns() {
+ let ctx = ClassifyContext::new();
+
+ let checked = classify_cluster(&cluster("[x]"), &ctx).unwrap();
+ assert_eq!(checked.kind, ElementKind::Toggle);
+ assert_eq!(checked.checked, Some(true));
+
+ let unchecked = classify_cluster(&cluster("[ ]"), &ctx).unwrap();
+ assert_eq!(unchecked.kind, ElementKind::Toggle);
+ assert_eq!(unchecked.checked, Some(false));
+ }
+
+ #[test]
+ fn input_patterns() {
+ let ctx = ClassifyContext::new();
+
+ let underscore = classify_cluster(&cluster("_____"), &ctx).unwrap();
+ assert_eq!(underscore.kind, ElementKind::Input);
+ assert!((underscore.confidence - 0.6).abs() < f32::EPSILON);
+
+ // Cursor position creates focused input
+ let ctx_cursor = ClassifyContext::new().with_cursor(0, 5);
+ let cursor_input = classify_cluster(&cluster_at(0, 0, "some text"), &ctx_cursor).unwrap();
+ assert_eq!(cursor_input.kind, ElementKind::Input);
+ assert!(cursor_input.focused);
+ }
+
+ #[test]
+ fn inverse_video_creates_focused_button() {
+ let ctx = ClassifyContext::new();
+ let result = classify_cluster(&inverse_cluster("File"), &ctx).unwrap();
+ assert_eq!(result.kind, ElementKind::Button);
+ assert!(result.focused);
+ assert!((result.confidence - 1.0).abs() < f32::EPSILON);
+ }
+
+ #[test]
+ fn non_interactive_filtered() {
+ let ctx = ClassifyContext::new();
+ assert!(classify_cluster(&cluster("Hello World"), &ctx).is_none());
+ assert!(classify_cluster(&cluster("https://example.com"), &ctx).is_none());
+ }
+
+ #[test]
+ fn classify_returns_sorted_elements() {
+ let ctx = ClassifyContext::new();
+ let clusters = vec![cluster("[OK]"), cluster("[Cancel]"), cluster("[ ]")];
+ let elements = classify(clusters, &ctx);
+
+ assert_eq!(elements.len(), 3);
+ assert_eq!(elements[0].kind, ElementKind::Button);
+ assert_eq!(elements[1].kind, ElementKind::Button);
+ assert_eq!(elements[2].kind, ElementKind::Toggle);
+ }
+
+ #[test]
+ fn detect_full_pipeline() {
+ let mut grid = SimpleGrid::from_text(&["[OK] [Cancel] [ ]"], 20);
+ let inverse = CellStyle::new().with_inverse(true);
+ let bold = CellStyle::new().with_bold(true);
+
+ grid.style_range(0, 0, 4, inverse);
+ grid.style_range(0, 5, 13, bold);
+
+ let elements = detect(&grid, &ClassifyContext::new());
+ let kinds: Vec = elements.iter().map(|e| e.kind).collect();
+
+ assert!(kinds.contains(&ElementKind::Button));
+ assert!(kinds.contains(&ElementKind::Toggle));
+ }
+
+ #[test]
+ fn tokenizer_extracts_from_text() {
+ let tokens = extract_bracketed_tokens("Save [OK] [Cancel]");
+ assert_eq!(tokens.len(), 2);
+ assert_eq!(tokens[0].text, "[OK]");
+ assert_eq!(tokens[1].text, "[Cancel]");
+ }
+
+ #[test]
+ fn dedup_extracts_button_from_text() {
+ let ctx = ClassifyContext::new();
+ let elements = extract_elements_from_cluster(&cluster("Save [OK] Cancel"), &ctx);
+
+ assert_eq!(elements.len(), 1);
+ assert_eq!(elements[0].text, "[OK]");
+ assert_eq!(elements[0].col, 5);
+ }
+
+ // ========================================================================
+ // Security & Edge Case Tests
+ // ========================================================================
+
+ #[test]
+ fn extract_tokens_rejects_oversized_input() {
+ // Verify that extremely long text is rejected to prevent memory exhaustion
+ let huge_text = "[".repeat(MAX_CLUSTER_TEXT_LEN + 1);
+ assert!(extract_bracketed_tokens(&huge_text).is_empty());
+
+ let huge_underscores = "_".repeat(MAX_CLUSTER_TEXT_LEN + 1);
+ assert!(extract_underscore_runs(&huge_underscores).is_empty());
+ }
+
+ #[test]
+ fn cursor_in_range_handles_overflow() {
+ // Verify saturating_add prevents overflow panic
+ let ctx = ClassifyContext::new().with_cursor(0, u16::MAX);
+
+ // Should not panic even with extreme values
+ assert!(!cursor_in_range(&ctx, 0, u16::MAX - 10, 100));
+
+ // Cursor near MAX should still work correctly
+ let ctx = ClassifyContext::new().with_cursor(0, u16::MAX - 5);
+ assert!(cursor_in_range(&ctx, 0, u16::MAX - 10, 10));
+ }
+
+ // ========================================================================
+ // Unicode Width Tests
+ // ========================================================================
+
+ #[test]
+ fn element_width_cjk() {
+ // CJK characters should have width 2 each
+ let ctx = ClassifyContext::new();
+ let elem = classify_cluster(&cluster("[确认]"), &ctx).unwrap();
+ // [=1 + 确=2 + 认=2 + ]=1 = 6
+ assert_eq!(elem.width, 6);
+ }
+
+ #[test]
+ fn element_width_ascii() {
+ // ASCII characters should have width 1 each
+ let ctx = ClassifyContext::new();
+ let elem = classify_cluster(&cluster("[OK]"), &ctx).unwrap();
+ // [=1 + O=1 + K=1 + ]=1 = 4
+ assert_eq!(elem.width, 4);
+ }
+
+ #[test]
+ fn element_width_mixed() {
+ // Mixed ASCII and CJK
+ let ctx = ClassifyContext::new();
+ let elem = classify_cluster(&cluster("[OK确认]"), &ctx).unwrap();
+ // [=1 + O=1 + K=1 + 确=2 + 认=2 + ]=1 = 8
+ assert_eq!(elem.width, 8);
+ }
+
+ #[test]
+ fn token_col_with_cjk_prefix_bracketed() {
+ // CJK characters before a bracketed token should offset by display width, not char count
+ let ctx = ClassifyContext::new();
+ // 确(width=2) + 认(width=2) = 4 columns before [OK]
+ let cluster = Cluster::new(0, 0, "确认[OK]".to_string(), CellStyle::default());
+ let elements = extract_elements_from_cluster(&cluster, &ctx);
+ assert_eq!(elements.len(), 1);
+ assert_eq!(elements[0].text, "[OK]");
+ assert_eq!(elements[0].col, 4); // Not 2 (char count)!
+ }
+
+ #[test]
+ fn token_col_with_cjk_prefix_underscore() {
+ // CJK characters before an underscore run should offset by display width
+ let ctx = ClassifyContext::new();
+ // 名(width=2) + 前(width=2) + :(width=1) = 5 columns before ____
+ let cluster = Cluster::new(0, 0, "名前:____".to_string(), CellStyle::default());
+ let elements = extract_elements_from_cluster(&cluster, &ctx);
+ assert_eq!(elements.len(), 1);
+ assert_eq!(elements[0].text, "____");
+ assert_eq!(elements[0].col, 5); // Not 3 (char count)!
+ }
+
+ #[test]
+ fn token_col_ascii_unchanged() {
+ // ASCII text should still work correctly (char count == display width)
+ let ctx = ClassifyContext::new();
+ let cluster = Cluster::new(0, 5, "Save [OK] Cancel".to_string(), CellStyle::default());
+ let elements = extract_elements_from_cluster(&cluster, &ctx);
+ assert_eq!(elements.len(), 1);
+ assert_eq!(elements[0].text, "[OK]");
+ assert_eq!(elements[0].col, 10); // 5 (cluster col) + 5 (offset of [OK])
+ }
+}
diff --git a/crates/pilotty-core/src/elements/grid.rs b/crates/pilotty-core/src/elements/grid.rs
new file mode 100644
index 0000000..6ef8f09
--- /dev/null
+++ b/crates/pilotty-core/src/elements/grid.rs
@@ -0,0 +1,149 @@
+//! Screen grid abstraction for element detection segmentation.
+//!
+//! Defines the `ScreenGrid` trait for uniform access to terminal screen content.
+
+use crate::elements::style::CellStyle;
+
+/// A single terminal cell with its character and visual style.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct ScreenCell {
+ /// The character in this cell (space for empty cells).
+ pub ch: char,
+ /// Visual style attributes.
+ pub style: CellStyle,
+}
+
+impl ScreenCell {
+ /// Create a new screen cell.
+ #[must_use]
+ pub fn new(ch: char, style: CellStyle) -> Self {
+ Self { ch, style }
+ }
+}
+
+/// Trait for accessing terminal screen content.
+///
+/// This abstraction allows element detection to work with any terminal backend.
+/// Uses 0-based coordinates matching the cursor API convention.
+pub trait ScreenGrid {
+ /// Number of rows in the grid.
+ fn rows(&self) -> u16;
+
+ /// Number of columns in the grid.
+ fn cols(&self) -> u16;
+
+ /// Get cell at the given position. Returns `None` if out of bounds.
+ fn cell(&self, row: u16, col: u16) -> Option;
+}
+
+#[cfg(test)]
+pub(crate) mod test_support {
+ use super::*;
+
+ /// A simple in-memory grid for testing.
+ #[derive(Debug, Clone)]
+ pub struct SimpleGrid {
+ cells: Vec,
+ rows: u16,
+ cols: u16,
+ }
+
+ impl SimpleGrid {
+ /// Create a new grid filled with empty cells.
+ #[must_use]
+ pub fn new(rows: u16, cols: u16) -> Self {
+ let cell_count = rows as usize * cols as usize;
+ Self {
+ cells: vec![ScreenCell::new(' ', CellStyle::default()); cell_count],
+ rows,
+ cols,
+ }
+ }
+
+ /// Create a grid from text lines.
+ #[must_use]
+ pub fn from_text(lines: &[&str], cols: u16) -> Self {
+ let rows = lines.len() as u16;
+ let mut grid = Self::new(rows, cols);
+
+ for (row_idx, line) in lines.iter().enumerate() {
+ for (col_idx, ch) in line.chars().enumerate() {
+ if col_idx < cols as usize {
+ if let Some(idx) = grid.index(row_idx as u16, col_idx as u16) {
+ grid.cells[idx] = ScreenCell::new(ch, CellStyle::default());
+ }
+ }
+ }
+ }
+
+ grid
+ }
+
+ /// Apply a style to a range of cells in a row.
+ pub fn style_range(&mut self, row: u16, start_col: u16, end_col: u16, style: CellStyle) {
+ for col in start_col..end_col {
+ if let Some(idx) = self.index(row, col) {
+ self.cells[idx].style = style;
+ }
+ }
+ }
+
+ fn index(&self, row: u16, col: u16) -> Option {
+ if row < self.rows && col < self.cols {
+ Some(row as usize * self.cols as usize + col as usize)
+ } else {
+ None
+ }
+ }
+ }
+
+ impl ScreenGrid for SimpleGrid {
+ fn rows(&self) -> u16 {
+ self.rows
+ }
+
+ fn cols(&self) -> u16 {
+ self.cols
+ }
+
+ fn cell(&self, row: u16, col: u16) -> Option {
+ self.index(row, col).map(|i| self.cells[i].clone())
+ }
+ }
+}
+
+// Re-export for tests in other modules
+#[cfg(test)]
+pub(crate) use test_support::SimpleGrid;
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn screen_cell_creation() {
+ let cell = ScreenCell::new('A', CellStyle::default());
+ assert_eq!(cell.ch, 'A');
+ }
+
+ #[test]
+ fn simple_grid_from_text() {
+ let grid = SimpleGrid::from_text(&["Hello", "World"], 10);
+ assert_eq!(grid.rows(), 2);
+ assert_eq!(grid.cols(), 10);
+ assert_eq!(grid.cell(0, 0).unwrap().ch, 'H');
+ assert_eq!(grid.cell(1, 0).unwrap().ch, 'W');
+ }
+
+ #[test]
+ fn simple_grid_style_range() {
+ let mut grid = SimpleGrid::from_text(&["[OK]"], 10);
+ let inverse = CellStyle::new().with_inverse(true);
+
+ grid.style_range(0, 0, 4, inverse);
+
+ assert!(grid.cell(0, 0).unwrap().style.inverse);
+ assert!(grid.cell(0, 3).unwrap().style.inverse);
+ assert!(!grid.cell(0, 4).unwrap().style.inverse);
+ }
+}
diff --git a/crates/pilotty-core/src/elements/mod.rs b/crates/pilotty-core/src/elements/mod.rs
new file mode 100644
index 0000000..4602451
--- /dev/null
+++ b/crates/pilotty-core/src/elements/mod.rs
@@ -0,0 +1,170 @@
+//! UI element detection types.
+//!
+//! This module provides types for detecting and classifying terminal UI elements.
+//! It uses a heuristic pipeline that segments the terminal buffer by visual
+//! style, then classifies segments into semantic kinds.
+//!
+//! # Element Kinds
+//!
+//! We use a simplified 3-kind model instead of many roles:
+//! - **Button**: Clickable elements (bracketed text, inverse video)
+//! - **Input**: Text entry fields (cursor position, underscore runs)
+//! - **Toggle**: Checkbox/radio elements with on/off state
+//!
+//! # Detection Rules (priority order)
+//!
+//! 1. Cursor position → Input (confidence: 1.0, focused: true)
+//! 2. Checkbox pattern `[x]`/`[ ]`/`☑`/`☐` → Toggle (confidence: 1.0)
+//! 3. Inverse video → Button (confidence: 1.0, focused: true)
+//! 4. Bracket pattern `[OK]`/`` → Button (confidence: 0.8)
+//! 5. Underscore field `____` → Input (confidence: 0.6)
+//!
+//! Non-interactive elements (links, progress bars, status text) are filtered out.
+//! They remain in `snapshot.text` for agents to read, not as elements.
+
+pub mod classify;
+pub mod grid;
+pub mod segment;
+pub mod style;
+
+use serde::{Deserialize, Serialize};
+
+/// Kind of interactive element.
+///
+/// Simplified from 11 roles to 3 kinds based on what agents actually need:
+/// - What kind is it? (button/input/toggle)
+/// - Is it focused?
+/// - What's the toggle state? (for toggles only)
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum ElementKind {
+ /// Clickable element (buttons, menu items, tabs).
+ /// Detected via: inverse video, bracket patterns `[OK]`, ``.
+ Button,
+ /// Text entry field.
+ /// Detected via: cursor position, underscore runs `____`.
+ Input,
+ /// Checkbox or radio button with on/off state.
+ /// Detected via: `[x]`, `[ ]`, `☑`, `☐` patterns.
+ Toggle,
+}
+
+/// A detected interactive UI element.
+///
+/// # Coordinates
+///
+/// All coordinates are 0-based (row, col) to match cursor API.
+/// Height is always 1 in v1 (single-row elements only).
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub struct Element {
+ /// Kind of interactive element.
+ pub kind: ElementKind,
+
+ /// Row index (0-based, from top).
+ pub row: u16,
+
+ /// Column index (0-based, from left).
+ pub col: u16,
+
+ /// Width in terminal cells.
+ pub width: u16,
+
+ /// Text content of the element.
+ pub text: String,
+
+ /// Detection confidence (0.0-1.0).
+ /// - 1.0: High confidence (cursor, inverse video, checkbox pattern)
+ /// - 0.8: Medium confidence (bracket pattern)
+ /// - 0.6: Low confidence (underscore run)
+ pub confidence: f32,
+
+ /// Whether this element currently has focus.
+ /// Orthogonal to kind, applies to any element type.
+ #[serde(default, skip_serializing_if = "is_false")]
+ pub focused: bool,
+
+ /// Checked state for Toggle kind (None for non-toggles).
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub checked: Option,
+}
+
+/// Helper for serde skip_serializing_if.
+fn is_false(b: &bool) -> bool {
+ !*b
+}
+
+impl Element {
+ /// Create a new element.
+ #[must_use]
+ pub fn new(
+ kind: ElementKind,
+ row: u16,
+ col: u16,
+ width: u16,
+ text: String,
+ confidence: f32,
+ ) -> Self {
+ Self {
+ kind,
+ row,
+ col,
+ width,
+ text,
+ confidence,
+ focused: false,
+ checked: None,
+ }
+ }
+
+ /// Set checked state (for toggles).
+ #[must_use]
+ pub fn with_checked(mut self, checked: bool) -> Self {
+ self.checked = Some(checked);
+ self
+ }
+
+ /// Set focused state.
+ #[must_use]
+ pub fn with_focused(mut self, focused: bool) -> Self {
+ self.focused = focused;
+ self
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn element_kind_serializes_to_snake_case() {
+ assert_eq!(
+ serde_json::to_string(&ElementKind::Button).unwrap(),
+ "\"button\""
+ );
+ assert_eq!(
+ serde_json::to_string(&ElementKind::Toggle).unwrap(),
+ "\"toggle\""
+ );
+ }
+
+ #[test]
+ fn element_serialization_omits_optional_fields() {
+ let elem = Element::new(ElementKind::Button, 0, 0, 4, "OK".to_string(), 0.8);
+ let json = serde_json::to_string(&elem).unwrap();
+
+ // Buttons shouldn't have checked, unfocused elements shouldn't have focused
+ assert!(!json.contains("checked"));
+ assert!(!json.contains("focused"));
+ }
+
+ #[test]
+ fn element_serialization_includes_set_fields() {
+ let elem = Element::new(ElementKind::Toggle, 0, 0, 3, "[x]".to_string(), 1.0)
+ .with_checked(true)
+ .with_focused(true);
+ let json = serde_json::to_string(&elem).unwrap();
+
+ assert!(json.contains("\"checked\":true"));
+ assert!(json.contains("\"focused\":true"));
+ }
+}
diff --git a/crates/pilotty-core/src/elements/segment.rs b/crates/pilotty-core/src/elements/segment.rs
new file mode 100644
index 0000000..eae531d
--- /dev/null
+++ b/crates/pilotty-core/src/elements/segment.rs
@@ -0,0 +1,208 @@
+//! Segmentation: grouping adjacent cells by visual style.
+//!
+//! Scans the terminal grid row by row, grouping adjacent cells with identical
+//! visual styles into clusters for classification.
+
+use unicode_width::UnicodeWidthStr;
+
+use crate::elements::grid::ScreenGrid;
+use crate::elements::style::CellStyle;
+
+/// A cluster of adjacent cells with identical visual style.
+///
+/// Clusters are the intermediate representation between raw cells and
+/// classified elements. Each cluster spans a contiguous horizontal region
+/// of a single row.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct Cluster {
+ /// Row index (0-based, from top).
+ pub row: u16,
+ /// Column index (0-based, from left).
+ pub col: u16,
+ /// Width in terminal cells.
+ pub width: u16,
+ /// Text content of the cluster.
+ pub text: String,
+ /// Visual style shared by all cells in this cluster.
+ pub style: CellStyle,
+}
+
+impl Cluster {
+ /// Create a new cluster.
+ #[must_use]
+ pub fn new(row: u16, col: u16, text: String, style: CellStyle) -> Self {
+ // Use unicode-width for proper terminal column alignment.
+ // CJK characters are width 2, zero-width chars are width 0.
+ let width = text.width().min(u16::MAX as usize) as u16;
+ Self {
+ row,
+ col,
+ width,
+ text,
+ style,
+ }
+ }
+
+ /// Check if this cluster contains only whitespace.
+ #[must_use]
+ pub fn is_whitespace_only(&self) -> bool {
+ self.text.chars().all(|c| c.is_whitespace())
+ }
+}
+
+/// Segment a single row into clusters.
+fn segment_row(grid: &G, row: u16) -> Vec {
+ let mut clusters = Vec::new();
+
+ if row >= grid.rows() {
+ return clusters;
+ }
+
+ let mut current_text = String::new();
+ let mut current_style: Option = None;
+ let mut start_col: u16 = 0;
+
+ for col in 0..grid.cols() {
+ let Some(cell) = grid.cell(row, col) else {
+ continue;
+ };
+
+ match current_style {
+ Some(ref style) if *style == cell.style => {
+ // Same style, extend current cluster
+ current_text.push(cell.ch);
+ }
+ _ => {
+ // Style changed or first cell, finalize previous cluster
+ if let Some(style) = current_style.take() {
+ if !current_text.is_empty() {
+ clusters.push(Cluster::new(
+ row,
+ start_col,
+ std::mem::take(&mut current_text),
+ style,
+ ));
+ }
+ }
+ // Start new cluster
+ start_col = col;
+ current_style = Some(cell.style);
+ current_text.push(cell.ch);
+ }
+ }
+ }
+
+ // Don't forget the last cluster
+ if let Some(style) = current_style {
+ if !current_text.is_empty() {
+ clusters.push(Cluster::new(row, start_col, current_text, style));
+ }
+ }
+
+ clusters
+}
+
+/// Segment an entire grid into clusters.
+fn segment_grid(grid: &G) -> Vec {
+ let mut clusters = Vec::new();
+
+ for row in 0..grid.rows() {
+ clusters.extend(segment_row(grid, row));
+ }
+
+ clusters
+}
+
+/// Filter out whitespace-only clusters.
+fn filter_whitespace(clusters: Vec) -> Vec {
+ clusters
+ .into_iter()
+ .filter(|c| !c.is_whitespace_only())
+ .collect()
+}
+
+/// Segment a grid and filter whitespace in one step.
+///
+/// Convenience function that combines `segment_grid` and `filter_whitespace`.
+#[must_use]
+pub fn segment(grid: &G) -> Vec {
+ filter_whitespace(segment_grid(grid))
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+ use crate::elements::grid::test_support::SimpleGrid;
+
+ #[test]
+ fn cluster_creation() {
+ let cluster = Cluster::new(5, 10, "Hello".to_string(), CellStyle::default());
+ assert_eq!(cluster.row, 5);
+ assert_eq!(cluster.col, 10);
+ assert_eq!(cluster.width, 5);
+ assert_eq!(cluster.text, "Hello");
+ assert!(!cluster.is_whitespace_only());
+ }
+
+ #[test]
+ fn segment_splits_by_style() {
+ let mut grid = SimpleGrid::from_text(&["AABBBCC"], 7);
+ let bold = CellStyle::new().with_bold(true);
+ let inverse = CellStyle::new().with_inverse(true);
+
+ grid.style_range(0, 2, 5, bold);
+ grid.style_range(0, 5, 7, inverse);
+
+ let clusters = segment_row(&grid, 0);
+
+ assert_eq!(clusters.len(), 3);
+ assert_eq!(clusters[0].text, "AA");
+ assert_eq!(clusters[0].col, 0);
+ assert_eq!(clusters[1].text, "BBB");
+ assert!(clusters[1].style.bold);
+ assert_eq!(clusters[2].text, "CC");
+ assert!(clusters[2].style.inverse);
+ }
+
+ #[test]
+ fn segment_filters_whitespace() {
+ let mut grid = SimpleGrid::from_text(&["[OK] [Cancel]"], 20);
+ let inverse = CellStyle::new().with_inverse(true);
+
+ grid.style_range(0, 0, 4, inverse);
+ grid.style_range(0, 9, 17, inverse);
+
+ let clusters = segment(&grid);
+
+ assert!(clusters.iter().all(|c| !c.is_whitespace_only()));
+ let texts: Vec<&str> = clusters.iter().map(|c| c.text.as_str()).collect();
+ assert!(texts.contains(&"[OK]"));
+ assert!(texts.contains(&"[Cancel]"));
+ }
+
+ // ========================================================================
+ // Unicode Width Tests
+ // ========================================================================
+
+ #[test]
+ fn cluster_width_cjk() {
+ // CJK characters should have width 2 each
+ let cluster = Cluster::new(0, 0, "你好".to_string(), CellStyle::default());
+ assert_eq!(cluster.width, 4); // 2 + 2 = 4
+ }
+
+ #[test]
+ fn cluster_width_ascii() {
+ // ASCII characters should have width 1 each
+ let cluster = Cluster::new(0, 0, "Hello".to_string(), CellStyle::default());
+ assert_eq!(cluster.width, 5);
+ }
+
+ #[test]
+ fn cluster_width_mixed() {
+ // Mixed ASCII and CJK
+ let cluster = Cluster::new(0, 0, "Hi你好".to_string(), CellStyle::default());
+ // H=1 + i=1 + 你=2 + 好=2 = 6
+ assert_eq!(cluster.width, 6);
+ }
+}
diff --git a/crates/pilotty-core/src/elements/style.rs b/crates/pilotty-core/src/elements/style.rs
new file mode 100644
index 0000000..a600a5d
--- /dev/null
+++ b/crates/pilotty-core/src/elements/style.rs
@@ -0,0 +1,126 @@
+//! Visual style types for element detection segmentation.
+//!
+//! These types represent cell styling independent of the vt100 crate,
+//! allowing the core element detection types to remain vt100-agnostic.
+
+use serde::{Deserialize, Serialize};
+
+/// Terminal color representation.
+///
+/// Maps to standard terminal color modes:
+/// - Default: terminal's default foreground/background
+/// - Indexed: 256-color palette (0-255)
+/// - Rgb: 24-bit true color
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Default, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case", tag = "type")]
+pub enum Color {
+ /// Terminal default color.
+ #[default]
+ Default,
+ /// 256-color palette index (0-255).
+ Indexed { index: u8 },
+ /// 24-bit RGB color.
+ Rgb { r: u8, g: u8, b: u8 },
+}
+
+impl Color {
+ /// Create an indexed color.
+ #[must_use]
+ pub fn indexed(index: u8) -> Self {
+ Self::Indexed { index }
+ }
+
+ /// Create an RGB color.
+ #[must_use]
+ pub fn rgb(r: u8, g: u8, b: u8) -> Self {
+ Self::Rgb { r, g, b }
+ }
+}
+
+/// Visual style attributes for a terminal cell.
+///
+/// Used for segmentation: adjacent cells with identical styles are grouped
+/// into clusters.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Default, Serialize, Deserialize)]
+pub struct CellStyle {
+ /// Bold text attribute.
+ pub bold: bool,
+ /// Underlined text attribute.
+ pub underline: bool,
+ /// Inverse video (swapped fg/bg).
+ pub inverse: bool,
+ /// Foreground color.
+ pub fg_color: Color,
+ /// Background color.
+ pub bg_color: Color,
+}
+
+impl CellStyle {
+ /// Create a new cell style with default values.
+ #[must_use]
+ pub fn new() -> Self {
+ Self::default()
+ }
+
+ /// Set bold attribute.
+ #[must_use]
+ pub fn with_bold(mut self, bold: bool) -> Self {
+ self.bold = bold;
+ self
+ }
+
+ /// Set underline attribute.
+ #[must_use]
+ pub fn with_underline(mut self, underline: bool) -> Self {
+ self.underline = underline;
+ self
+ }
+
+ /// Set inverse attribute.
+ #[must_use]
+ pub fn with_inverse(mut self, inverse: bool) -> Self {
+ self.inverse = inverse;
+ self
+ }
+
+ /// Set foreground color.
+ #[must_use]
+ pub fn with_fg(mut self, color: Color) -> Self {
+ self.fg_color = color;
+ self
+ }
+
+ /// Set background color.
+ #[must_use]
+ pub fn with_bg(mut self, color: Color) -> Self {
+ self.bg_color = color;
+ self
+ }
+
+ /// Check if this style uses inverse video.
+ ///
+ /// Inverse video is a strong signal for selected menu items and tabs.
+ #[must_use]
+ pub fn is_inverse(&self) -> bool {
+ self.inverse
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn cell_style_default() {
+ let style = CellStyle::default();
+ assert!(!style.bold);
+ assert!(!style.inverse);
+ assert_eq!(style.fg_color, Color::Default);
+ }
+
+ #[test]
+ fn is_inverse_helper() {
+ assert!(!CellStyle::new().is_inverse());
+ assert!(CellStyle::new().with_inverse(true).is_inverse());
+ }
+}
diff --git a/crates/pilotty-core/src/lib.rs b/crates/pilotty-core/src/lib.rs
index f8b3c45..6c98556 100644
--- a/crates/pilotty-core/src/lib.rs
+++ b/crates/pilotty-core/src/lib.rs
@@ -1,8 +1,31 @@
//! Core types and logic for pilotty.
//!
-//! This crate provides the shared data structures and algorithms used by both
-//! the CLI/daemon and the MCP server.
+//! This crate provides shared data structures and algorithms for AI-driven
+//! terminal automation. It's used by both the CLI/daemon and MCP server.
+//!
+//! # Modules
+//!
+//! - [`error`]: API error types with actionable suggestions for AI consumers
+//! - [`input`]: Terminal input encoding (keys, mouse, modifiers)
+//! - [`protocol`]: JSON-line request/response protocol
+//! - [`snapshot`]: Screen state capture and change detection
+//! - [`elements`]: UI element detection
+//!
+//! # Element Detection
+//!
+//! pilotty detects interactive UI elements using a simplified 3-kind model
+//! optimized for AI agents:
+//!
+//! | Kind | Detection | Confidence |
+//! |------|-----------|------------|
+//! | **Button** | Inverse video, `[OK]`, `` | 1.0 / 0.8 |
+//! | **Input** | Cursor position, `____` underscores | 1.0 / 0.6 |
+//! | **Toggle** | `[x]`, `[ ]`, `☑`, `☐` | 1.0 |
+//!
+//! Elements include row/col coordinates for use with the click command.
+//! The `content_hash` field enables efficient change detection.
+pub mod elements;
pub mod error;
pub mod input;
pub mod protocol;
diff --git a/crates/pilotty-core/src/protocol.rs b/crates/pilotty-core/src/protocol.rs
index c6eac27..42154ea 100644
--- a/crates/pilotty-core/src/protocol.rs
+++ b/crates/pilotty-core/src/protocol.rs
@@ -79,12 +79,12 @@ pub enum Command {
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum SnapshotFormat {
- /// Full JSON with all metadata.
+ /// Full JSON with all metadata including text and elements.
#[default]
Full,
- /// Compact format with inline refs.
+ /// Compact format: omits text and elements, just metadata.
Compact,
- /// Plain text only.
+ /// Plain text only (no JSON structure).
Text,
}
@@ -97,7 +97,7 @@ pub enum ScrollDirection {
}
/// A response from daemon to CLI.
-#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Response {
pub id: String,
pub success: bool,
@@ -128,7 +128,7 @@ impl Response {
}
/// Response payload variants.
-#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ResponseData {
/// Full screen state snapshot.
diff --git a/crates/pilotty-core/src/snapshot.rs b/crates/pilotty-core/src/snapshot.rs
index bf8c884..9c0cbc0 100644
--- a/crates/pilotty-core/src/snapshot.rs
+++ b/crates/pilotty-core/src/snapshot.rs
@@ -1,7 +1,32 @@
-//! Screen state types.
+//! Screen state capture and change detection.
+//!
+//! This module provides types for capturing terminal screen state, including
+//! text content, cursor position, and detected UI elements.
+//!
+//! # Snapshot Formats
+//!
+//! The daemon supports two snapshot formats:
+//!
+//! | Format | Content | Use Case |
+//! |--------|---------|----------|
+//! | **Full** | text + elements + hash | Complete state for new screens |
+//! | **Compact** | metadata only | Quick status checks |
+//!
+//! # Change Detection
+//!
+//! The `content_hash` field provides efficient change detection. Agents can
+//! compare hashes across snapshots without parsing the full element list:
+//!
+//! ```ignore
+//! if new_snapshot.content_hash != old_snapshot.content_hash {
+//! // Screen changed, re-analyze elements
+//! }
+//! ```
use serde::{Deserialize, Serialize};
+use crate::elements::Element;
+
/// Terminal dimensions.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub struct TerminalSize {
@@ -18,7 +43,7 @@ pub struct CursorState {
}
/// Complete screen state snapshot.
-#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct ScreenState {
pub snapshot_id: u64,
pub size: TerminalSize,
@@ -26,6 +51,21 @@ pub struct ScreenState {
/// Plain text content of the screen.
#[serde(skip_serializing_if = "Option::is_none")]
pub text: Option,
+ /// Detected interactive UI elements.
+ ///
+ /// Elements are detected using visual style segmentation and pattern
+ /// classification. Each element includes its position (row, col) for
+ /// interaction via the click command.
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub elements: Option>,
+ /// Hash of screen content for change detection.
+ ///
+ /// Computed from the screen text using a fast non-cryptographic hash.
+ /// Present when `elements` is requested (`with_elements=true`).
+ /// Agents can compare hashes across snapshots to detect screen changes
+ /// without parsing the full element list.
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub content_hash: Option,
}
impl ScreenState {
@@ -39,6 +79,71 @@ impl ScreenState {
visible: true,
},
text: None,
+ elements: None,
+ content_hash: None,
}
}
}
+
+/// Compute a content hash from screen text.
+///
+/// Uses FNV-1a, a fast non-cryptographic hash suitable for change detection.
+#[must_use]
+pub fn compute_content_hash(text: &str) -> u64 {
+ // FNV-1a parameters for 64-bit
+ const FNV_OFFSET: u64 = 0xcbf29ce484222325;
+ const FNV_PRIME: u64 = 0x00000100000001B3;
+
+ let mut hash = FNV_OFFSET;
+ for byte in text.bytes() {
+ hash ^= u64::from(byte);
+ hash = hash.wrapping_mul(FNV_PRIME);
+ }
+ hash
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn content_hash_deterministic() {
+ let text = "Hello, World!";
+ let hash1 = compute_content_hash(text);
+ let hash2 = compute_content_hash(text);
+ assert_eq!(hash1, hash2);
+ }
+
+ #[test]
+ fn content_hash_differs_for_different_text() {
+ let hash1 = compute_content_hash("Hello");
+ let hash2 = compute_content_hash("World");
+ assert_ne!(hash1, hash2);
+ }
+
+ #[test]
+ fn content_hash_empty_string() {
+ // Empty string should return the FNV-1a offset basis
+ let hash = compute_content_hash("");
+ assert_eq!(hash, 0xcbf29ce484222325);
+ }
+
+ #[test]
+ fn content_hash_single_char_difference() {
+ // Even a single character difference should produce different hashes
+ let hash1 = compute_content_hash("test");
+ let hash2 = compute_content_hash("tess");
+ assert_ne!(hash1, hash2);
+ }
+
+ #[test]
+ fn content_hash_unicode() {
+ // Unicode text should hash consistently
+ let text = "日本語テスト 🚀";
+ let hash1 = compute_content_hash(text);
+ let hash2 = compute_content_hash(text);
+ assert_eq!(hash1, hash2);
+ // Should differ from ASCII
+ assert_ne!(hash1, compute_content_hash("ascii"));
+ }
+}
diff --git a/npm/README.md b/npm/README.md
index 32395ce..a4a28b2 100644
--- a/npm/README.md
+++ b/npm/README.md
@@ -1,25 +1,26 @@
-
+
pilotty
- Terminal automation CLI for AI agents
- Like agent-browser, but for TUI applications.
+ The terminal equivalent of agent-browser
----
+
+ Terminal automation CLI for AI agents
+ Control vim, htop, lazygit, dialog, and any TUI programmatically
+
-pilotty enables AI agents to interact with terminal applications (vim, htop, lazygit, dialog, etc.) through a simple CLI interface. It manages PTY sessions, captures terminal output, and provides keyboard/mouse input capabilities for navigating TUI applications.
+
+
+
+
-## Features
+---
-- **PTY Management**: Spawn and manage terminal applications in background sessions
-- **Keyboard Navigation**: Interact with TUIs using Tab, Enter, arrow keys, and key combos
-- **AI-Friendly Output**: Clean JSON responses with actionable suggestions on errors
-- **Multi-Session**: Run multiple terminal apps simultaneously in isolated sessions
-- **Zero Config**: Daemon auto-starts on first command, auto-stops after 5 minutes idle
+pilotty enables AI agents to interact with terminal applications through a simple command-line interface. It manages pseudo-terminal (PTY) sessions with full VT100 terminal emulation, captures screen state, and provides keyboard/mouse input for navigating terminal user interfaces.
## Installation
@@ -83,6 +84,17 @@ The `snapshot` command returns structured data about the terminal screen:
Use the cursor position and text content to understand the screen state and navigate using keyboard commands (Tab, Enter, arrow keys) or click at specific coordinates.
+## Documentation
+
+See the **[GitHub repository](https://github.com/msmps/pilotty)** for full documentation including:
+
+- All commands reference
+- Session management
+- Key combinations
+- UI element detection
+- AI agent workflow examples
+- Daemon architecture
+
## Building from Source
```bash
@@ -94,10 +106,6 @@ cargo build --release
Requires [Rust](https://rustup.rs) 1.70+.
-## Documentation
-
-See the [GitHub repository](https://github.com/msmps/pilotty) for full documentation including all commands, key combinations, and AI agent workflow examples.
-
## License
MIT
diff --git a/skills/pilotty/SKILL.md b/skills/pilotty/SKILL.md
index de357f4..c9efb74 100644
--- a/skills/pilotty/SKILL.md
+++ b/skills/pilotty/SKILL.md
@@ -30,7 +30,7 @@ This is the #1 cause of agent failures. When in doubt: **flags first, then comma
```bash
pilotty spawn vim file.txt # Start TUI app in managed session
pilotty wait-for "file.txt" # Wait for app to be ready
-pilotty snapshot # Get screen state with cursor position
+pilotty snapshot # Get screen state with UI elements
pilotty key i # Enter insert mode
pilotty type "Hello, World!" # Type text
pilotty key Escape # Exit insert mode
@@ -41,9 +41,10 @@ pilotty kill # End session
1. **Spawn**: `pilotty spawn ` starts the app in a background PTY
2. **Wait**: `pilotty wait-for ` ensures the app is ready
-3. **Snapshot**: `pilotty snapshot` returns screen state with text content and cursor position
-4. **Interact**: Use keyboard commands (`key`, `type`) or click at coordinates (`click `)
-5. **Re-snapshot**: After screen changes, snapshot again to see updated state
+3. **Snapshot**: `pilotty snapshot` returns screen state with detected UI elements
+4. **Understand**: Parse `elements[]` to identify buttons, inputs, toggles
+5. **Interact**: Use keyboard commands (`key`, `type`) to navigate and interact
+6. **Re-snapshot**: Check `content_hash` to detect screen changes
## Commands
@@ -56,14 +57,14 @@ pilotty kill # Kill default session
pilotty kill -s myapp # Kill specific session
pilotty list-sessions # List all active sessions
pilotty daemon # Manually start daemon (usually auto-starts)
-pilotty stop # Stop daemon and all sessions
+pilotty shutdown # Stop daemon and all sessions
pilotty examples # Show end-to-end workflow example
```
### Screen capture
```bash
-pilotty snapshot # Full JSON with text content
+pilotty snapshot # Full JSON with text content and elements
pilotty snapshot --format compact # JSON without text field
pilotty snapshot --format text # Plain text with cursor indicator
pilotty snapshot -s myapp # Snapshot specific session
@@ -125,16 +126,23 @@ PILOTTY_SOCKET_DIR="/tmp/pilotty" # Override socket directory
RUST_LOG="debug" # Enable debug logging
```
-## Snapshot output
+## Snapshot Output
-The `snapshot` command returns structured JSON:
+The `snapshot` command returns structured JSON with detected UI elements:
```json
{
"snapshot_id": 42,
"size": { "cols": 80, "rows": 24 },
"cursor": { "row": 5, "col": 10, "visible": true },
- "text": "... plain text content ..."
+ "text": "Settings:\n [x] Notifications [ ] Dark mode\n [Save] [Cancel]",
+ "elements": [
+ { "kind": "toggle", "row": 1, "col": 2, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
+ { "kind": "toggle", "row": 1, "col": 20, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
+ { "kind": "button", "row": 2, "col": 2, "width": 6, "text": "[Save]", "confidence": 0.8 },
+ { "kind": "button", "row": 2, "col": 10, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
+ ],
+ "content_hash": 12345678901234567890
}
```
@@ -147,7 +155,85 @@ bash-3.2$ [_]
The `[_]` shows cursor position. Use the text content to understand screen state and navigate with keyboard commands.
-## Navigation approach
+---
+
+## Element Detection
+
+pilotty automatically detects interactive UI elements in terminal applications. Elements provide **read-only context** to help understand UI structure.
+
+### Element Kinds
+
+| Kind | Detection Patterns | Confidence | Fields |
+|------|-------------------|------------|--------|
+| **toggle** | `[x]`, `[ ]`, `[*]`, `☑`, `☐` | 1.0 | `checked: bool` |
+| **button** | Inverse video, `[OK]`, ``, `(Submit)` | 1.0 / 0.8 | `focused: bool` (if true) |
+| **input** | Cursor position, `____` underscores | 1.0 / 0.6 | `focused: bool` (if true) |
+
+### Element Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `kind` | string | Element type: `button`, `input`, or `toggle` |
+| `row` | number | Row position (0-based from top) |
+| `col` | number | Column position (0-based from left) |
+| `width` | number | Width in terminal cells (CJK chars = 2) |
+| `text` | string | Text content of the element |
+| `confidence` | number | Detection confidence (0.0-1.0) |
+| `focused` | bool | Whether element has focus (only present if true) |
+| `checked` | bool | Toggle state (only present for toggles) |
+
+### Confidence Levels
+
+| Confidence | Meaning |
+|------------|---------|
+| **1.0** | High confidence: Cursor position, inverse video, checkbox patterns |
+| **0.8** | Medium confidence: Bracket patterns `[OK]`, `` |
+| **0.6** | Lower confidence: Underscore input fields `____` |
+
+### Change Detection
+
+The `content_hash` field enables efficient screen change detection:
+
+```bash
+# Get initial state
+SNAP1=$(pilotty snapshot)
+HASH1=$(echo "$SNAP1" | jq -r '.content_hash')
+
+# Perform action
+pilotty key Tab
+
+# Check if screen changed
+SNAP2=$(pilotty snapshot)
+HASH2=$(echo "$SNAP2" | jq -r '.content_hash')
+
+if [ "$HASH1" != "$HASH2" ]; then
+ echo "Screen changed - re-analyze elements"
+fi
+```
+
+### Using Elements Effectively
+
+Elements are **read-only context** for understanding the UI. Use **keyboard navigation** for reliable interaction:
+
+```bash
+# 1. Get snapshot to understand UI structure
+pilotty snapshot | jq '.elements'
+# Output shows toggles (checked/unchecked) and buttons with positions
+
+# 2. Navigate and interact with keyboard (reliable approach)
+pilotty key Tab # Move to next element
+pilotty key Space # Toggle checkbox
+pilotty key Enter # Activate button
+
+# 3. Verify state changed
+pilotty snapshot | jq '.elements[] | select(.kind == "toggle")'
+```
+
+**Key insight**: Use elements to understand WHAT is on screen, use keyboard to interact with it.
+
+---
+
+## Navigation Approach
pilotty uses keyboard-first navigation, just like a human would:
@@ -160,6 +246,7 @@ pilotty key Tab # Move to next element
pilotty key Enter # Activate/select
pilotty key Escape # Cancel/back
pilotty key Up # Move up in list/menu
+pilotty key Space # Toggle checkbox
# 3. Type text when needed
pilotty type "search term"
@@ -169,7 +256,9 @@ pilotty key Enter
pilotty click 5 10 # Click at row 5, col 10
```
-**Key insight**: Parse the snapshot text to understand what's on screen, then use keyboard commands to navigate. This works reliably across all TUI applications.
+**Key insight**: Parse the snapshot text and elements to understand what's on screen, then use keyboard commands to navigate. This works reliably across all TUI applications.
+
+---
## Example: Edit file with vim
@@ -197,22 +286,64 @@ pilotty key -s editor Enter
pilotty list-sessions
```
-## Example: Dialog interaction
+## Example: Dialog checklist interaction
```bash
-# 1. Spawn dialog (--name before command)
-pilotty spawn --name dialog dialog --yesno "Continue?" 10 40
+# 1. Spawn dialog checklist (--name before command)
+pilotty spawn --name opts dialog --checklist "Select features:" 12 50 4 \
+ "notifications" "Push notifications" on \
+ "darkmode" "Dark mode theme" off \
+ "autosave" "Auto-save documents" on \
+ "telemetry" "Usage analytics" off
+
+# 2. Wait for dialog to render
+sleep 0.5
-# 2. Get snapshot to see the dialog
-pilotty snapshot -s dialog --format text
-# Shows: < Yes > and < No > buttons
+# 3. Get snapshot and examine elements
+pilotty snapshot -s opts | jq '.elements[] | select(.kind == "toggle")'
+# Shows toggle elements with checked state and positions
-# 3. Navigate with keyboard
-pilotty key -s dialog Tab # Move to next button
-pilotty key -s dialog Enter # Activate selected button
+# 4. Navigate to "darkmode" and toggle it
+pilotty key -s opts Down # Move to second option
+pilotty key -s opts Space # Toggle it on
-# Or click at coordinates if you know the button position
-pilotty click -s dialog 8 15 # Click at row 8, col 15
+# 5. Verify the change
+pilotty snapshot -s opts | jq '.elements[] | select(.kind == "toggle") | {text, checked}'
+
+# 6. Confirm selection
+pilotty key -s opts Enter
+
+# 7. Clean up
+pilotty kill -s opts
+```
+
+## Example: Form filling with elements
+
+```bash
+# 1. Spawn a form application
+pilotty spawn --name form my-form-app
+
+# 2. Get snapshot to understand form structure
+pilotty snapshot -s form | jq '.elements'
+# Shows inputs, toggles, and buttons with positions for click command
+
+# 3. Tab to first input (likely already focused)
+pilotty type -s form "myusername"
+
+# 4. Tab to password field
+pilotty key -s form Tab
+pilotty type -s form "mypassword"
+
+# 5. Tab to remember me and toggle
+pilotty key -s form Tab
+pilotty key -s form Space
+
+# 6. Tab to Login and activate
+pilotty key -s form Tab
+pilotty key -s form Enter
+
+# 7. Check result
+pilotty snapshot -s form --format text
```
## Example: Monitor with htop
@@ -235,6 +366,8 @@ pilotty key -s monitor q # Quit
pilotty kill -s monitor
```
+---
+
## Sessions
Each session is isolated with its own:
@@ -262,7 +395,7 @@ The first session spawned without `--name` is automatically named `default`.
> **Important:** The `--name` flag must come **before** the command. Everything after the command is passed as arguments to that command.
-## Daemon architecture
+## Daemon Architecture
pilotty uses a background daemon for session management:
@@ -273,7 +406,7 @@ pilotty uses a background daemon for session management:
You rarely need to manage the daemon manually.
-## Error handling
+## Error Handling
Errors include actionable suggestions:
@@ -293,7 +426,9 @@ Errors include actionable suggestions:
}
```
-## Common patterns
+---
+
+## Common Patterns
### Wait then act
@@ -310,6 +445,16 @@ pilotty snapshot --format text | grep "Error" # Check for errors
pilotty key Enter # Then proceed
```
+### Check for specific element
+
+```bash
+# Check if the first toggle is checked
+pilotty snapshot | jq '.elements[] | select(.kind == "toggle") | {text, checked}' | head -1
+
+# Find element at specific position
+pilotty snapshot | jq '.elements[] | select(.row == 5 and .col == 10)'
+```
+
### Retry on timeout
```bash
@@ -319,7 +464,9 @@ pilotty wait-for "Ready" -t 5000 || {
}
```
-## Deep-dive documentation
+---
+
+## Deep-dive Documentation
For detailed patterns and edge cases, see:
@@ -327,8 +474,9 @@ For detailed patterns and edge cases, see:
|-----------|-------------|
| [references/session-management.md](references/session-management.md) | Multi-session patterns, isolation, cleanup |
| [references/key-input.md](references/key-input.md) | Complete key combinations reference |
+| [references/element-detection.md](references/element-detection.md) | Detection rules, confidence, patterns |
-## Ready-to-use templates
+## Ready-to-use Templates
Executable workflow scripts:
@@ -337,10 +485,12 @@ Executable workflow scripts:
| [templates/vim-workflow.sh](templates/vim-workflow.sh) | Edit file with vim, save, exit |
| [templates/dialog-interaction.sh](templates/dialog-interaction.sh) | Handle dialog/whiptail prompts |
| [templates/multi-session.sh](templates/multi-session.sh) | Parallel TUI orchestration |
+| [templates/element-detection.sh](templates/element-detection.sh) | Element detection demo |
Usage:
```bash
./templates/vim-workflow.sh /tmp/myfile.txt "File content here"
./templates/dialog-interaction.sh
./templates/multi-session.sh
+./templates/element-detection.sh
```
diff --git a/skills/pilotty/references/element-detection.md b/skills/pilotty/references/element-detection.md
new file mode 100644
index 0000000..15080dc
--- /dev/null
+++ b/skills/pilotty/references/element-detection.md
@@ -0,0 +1,280 @@
+# Element Detection
+
+pilotty automatically detects interactive UI elements in terminal applications. Elements provide **read-only context** to help agents understand UI structure.
+
+## Overview
+
+pilotty analyzes terminal screen content and detects:
+- **Toggles**: Checkboxes like `[x]`, `[ ]`, `[*]`, `☑`, `☐`
+- **Buttons**: Action elements like `[OK]`, ``, `(Submit)`
+- **Inputs**: Text fields marked by underscores `____` or cursor position
+
+Each detected element includes:
+- Kind, position (row, col), width, text content
+- Confidence score (0.0-1.0)
+- State information (checked for toggles, focused for inputs/buttons)
+
+## Detection Rules
+
+### Priority Order (Highest to Lowest)
+
+1. **Cursor Position** - Input (confidence: 1.0, focused: true)
+2. **Checkbox Patterns** - Toggle (confidence: 1.0)
+3. **Inverse Video** - Button (confidence: 1.0, focused: true)
+4. **Bracket Patterns** - Button (confidence: 0.8)
+5. **Underscore Fields** - Input (confidence: 0.6)
+
+### Toggle Detection
+
+Toggles are detected from checkbox patterns:
+
+| Pattern | State | Notes |
+|---------|-------|-------|
+| `[x]`, `[X]` | checked: true | Standard checked |
+| `[ ]` | checked: false | Standard unchecked |
+| `[*]` | checked: true | Dialog/ncurses style |
+| `☑`, `✓`, `✔`, `☒` | checked: true | Unicode checkmarks |
+| `☐`, `□` | checked: false | Unicode unchecked |
+
+Example detection:
+```json
+{
+ "kind": "toggle",
+ "row": 5,
+ "col": 2,
+ "width": 3,
+ "text": "[x]",
+ "confidence": 1.0,
+ "checked": true
+}
+```
+
+### Button Detection
+
+Buttons are detected from:
+
+1. **Inverse video** (highest confidence)
+ - Text with reversed foreground/background colors
+ - Common in dialog, whiptail, and ncurses apps
+ - Confidence: 1.0, focused: true
+
+2. **Bracket patterns** (medium confidence)
+ - Square brackets: `[OK]`, `[Cancel]`, `[Save]`
+ - Angle brackets: ``, ``
+ - Parentheses: `(Submit)`, `(Reset)`
+ - Confidence: 0.8
+
+Example detection:
+```json
+{
+ "kind": "button",
+ "row": 10,
+ "col": 5,
+ "width": 6,
+ "text": "[Save]",
+ "confidence": 0.8
+}
+```
+
+### Input Detection
+
+Inputs are detected from:
+
+1. **Cursor position** (highest confidence)
+ - The cell where the cursor is located
+ - Confidence: 1.0, focused: true
+
+2. **Underscore runs** (lower confidence)
+ - 3+ consecutive underscores: `___`, `__________`
+ - Common in form-style TUIs
+ - Confidence: 0.6
+
+Example detection:
+```json
+{
+ "kind": "input",
+ "row": 8,
+ "col": 12,
+ "width": 10,
+ "text": "__________",
+ "confidence": 0.6
+}
+```
+
+## Non-Interactive Patterns (Filtered)
+
+The following patterns are recognized but NOT returned as interactive elements:
+
+| Pattern | Why Filtered |
+|---------|--------------|
+| `http://`, `https://` | Links are not clickable in most TUIs |
+| `[====]`, `[####]` | Progress bars |
+| `[ERROR]`, `[WARNING]`, `[INFO]` | Status indicators |
+| `[1]`, `[2]`, `1)`, `a)` | Menu prefixes |
+| `├`, `┤`, `│`, `┌`, `┐` | Box-drawing characters |
+
+## Element Fields Reference
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `kind` | string | Yes | `button`, `input`, or `toggle` |
+| `row` | number | Yes | Row position (0-based from top) |
+| `col` | number | Yes | Column position (0-based from left) |
+| `width` | number | Yes | Width in terminal cells |
+| `text` | string | Yes | Element text content |
+| `confidence` | number | Yes | Detection confidence (0.0-1.0) |
+| `focused` | bool | No | Present and true if element has focus |
+| `checked` | bool | No | Present for toggles only |
+
+### Width Calculation
+
+Element width uses Unicode display width:
+- ASCII characters: width 1
+- CJK characters (Chinese, Japanese, Korean): width 2
+- Emoji: width 2
+- Zero-width characters: width 0
+
+This matches terminal column alignment.
+
+## Content Hash
+
+Each snapshot includes a `content_hash` field for change detection:
+
+```json
+{
+ "content_hash": 12345678901234567890,
+ ...
+}
+```
+
+The hash is computed from the visible screen text content. Use it to:
+- Detect if the screen changed between snapshots
+- Avoid re-processing unchanged screens
+
+```bash
+HASH1=$(pilotty snapshot | jq -r '.content_hash')
+pilotty key Tab
+HASH2=$(pilotty snapshot | jq -r '.content_hash')
+[ "$HASH1" != "$HASH2" ] && echo "Screen changed"
+```
+
+## Best Practices
+
+### 1. Elements for Understanding, Keyboard for Interaction
+
+Elements tell you WHAT is on screen. Use keyboard to interact:
+
+```bash
+# See what's on screen
+pilotty snapshot | jq '.elements[] | {kind, text, row, col, checked}'
+
+# Navigate with keyboard
+pilotty key Tab # Move between elements
+pilotty key Space # Toggle checkboxes
+pilotty key Enter # Activate buttons
+```
+
+### 2. Check Confidence Levels
+
+Higher confidence means more reliable detection:
+
+```bash
+# Filter to high-confidence elements only
+pilotty snapshot | jq '.elements[] | select(.confidence >= 0.8)'
+```
+
+### 3. Find Elements by Content or Position
+
+```bash
+# Find element by text content
+pilotty snapshot | jq '.elements[] | select(.text | contains("Save"))'
+
+# Find element at specific position
+pilotty snapshot | jq '.elements[] | select(.row == 5 and .col == 10)'
+
+# Get first toggle
+pilotty snapshot | jq '[.elements[] | select(.kind == "toggle")][0]'
+```
+
+## Limitations
+
+### What Detection Does NOT Find
+
+1. **Menu items without markers** - Plain text menus need keyboard navigation
+2. **Custom widgets** - Non-standard UI patterns may not be recognized
+3. **Color-only highlighting** - Elements must have text patterns or inverse video
+4. **Disabled elements** - No distinction between enabled/disabled
+
+### What Detection Cannot Do
+
+1. **Click elements directly by name** - Use row/col with click command
+2. **Track elements across screens** - Elements may move; use text content to re-find
+
+## Troubleshooting
+
+### No Elements Detected
+
+1. Check if the app uses standard patterns:
+ ```bash
+ pilotty snapshot --format text # View raw screen
+ ```
+
+2. Look for inverse video (may show elements on button/input):
+ ```bash
+ pilotty snapshot | jq '.elements[] | select(.confidence == 1.0)'
+ ```
+
+### Wrong Element Kind
+
+The classifier uses heuristics. If `[x]` is detected as a button instead of toggle:
+1. Check for surrounding context
+2. Use `text` field to identify element purpose
+
+### Elements Missing After Action
+
+Element positions may change between snapshots. Track elements by:
+- Text content (most reliable)
+- Element kind
+- Approximate row/column position
+
+## Example: Complete Workflow
+
+```bash
+#!/bin/bash
+SESSION="form"
+
+# 1. Spawn application
+pilotty spawn --name $SESSION dialog --checklist "Options:" 15 50 4 \
+ "opt1" "Feature A" on \
+ "opt2" "Feature B" off \
+ "opt3" "Feature C" on \
+ "opt4" "Feature D" off
+
+sleep 0.5
+
+# 2. Analyze initial state
+echo "Initial state:"
+pilotty snapshot -s $SESSION | jq '.elements[] | select(.kind == "toggle") | {text, checked}'
+
+# 3. Find unchecked toggles
+UNCHECKED=$(pilotty snapshot -s $SESSION | jq '[.elements[] | select(.kind == "toggle" and .checked == false)] | length')
+echo "Unchecked toggles: $UNCHECKED"
+
+# 4. Navigate and toggle opt2
+pilotty key -s $SESSION Down # Move to opt2
+pilotty key -s $SESSION Space # Toggle it
+
+# 5. Verify change via content_hash
+HASH1=$(pilotty snapshot -s $SESSION | jq -r '.content_hash')
+echo "Hash after toggle: $HASH1"
+
+# 6. Confirm and check final state
+pilotty key -s $SESSION Enter
+sleep 0.3
+
+echo "Final state:"
+pilotty snapshot -s $SESSION | jq '.elements[] | select(.kind == "toggle") | {text, checked}'
+
+# 7. Cleanup
+pilotty kill -s $SESSION
+```
diff --git a/skills/pilotty/templates/dialog-interaction.sh b/skills/pilotty/templates/dialog-interaction.sh
index ae73233..0db4a18 100755
--- a/skills/pilotty/templates/dialog-interaction.sh
+++ b/skills/pilotty/templates/dialog-interaction.sh
@@ -1,6 +1,6 @@
#!/bin/bash
# Template: Interact with dialog/whiptail prompts
-# Demonstrates handling various dialog types
+# Demonstrates handling various dialog types with element detection
#
# Usage: ./dialog-interaction.sh
# Requires: dialog or whiptail installed
@@ -16,26 +16,32 @@ if ! command -v dialog &> /dev/null; then
exit 1
fi
+# Cleanup on exit
+cleanup() {
+ pilotty kill -s "$SESSION_NAME" 2>/dev/null || true
+}
+trap cleanup EXIT
+
echo "=== Dialog Interaction Demo ==="
# --- Yes/No Dialog ---
echo ""
echo "1. Yes/No Dialog"
-pilotty spawn --name "$SESSION_NAME" dialog --yesno "Do you want to continue?" 10 40
+pilotty spawn --name "$SESSION_NAME" dialog --yesno "Do you want to continue?" 10 40 >/dev/null
# Wait for dialog to render
-pilotty wait-for -s "$SESSION_NAME" "continue" -t 5000
+pilotty wait-for -s "$SESSION_NAME" "continue" -t 5000 >/dev/null
-# Take snapshot to see buttons
-echo "Snapshot:"
-pilotty snapshot -s "$SESSION_NAME" --format compact
+# Show detected elements
+echo "Detected elements:"
+pilotty snapshot -s "$SESSION_NAME" | jq -r '.elements[] | " \(.kind) \(.text) at (\(.row),\(.col))"'
# Select Yes using keyboard (Enter selects the default button)
-pilotty key -s "$SESSION_NAME" Enter # Select default (Yes)
+pilotty key -s "$SESSION_NAME" Enter >/dev/null
sleep 0.5
-echo "Selected: Yes"
+echo "Selected: Yes (via Enter)"
# --- Menu Dialog ---
echo ""
@@ -45,36 +51,45 @@ pilotty spawn --name "$SESSION_NAME" dialog --menu "Choose an option:" 15 50 4 \
1 "Option One" \
2 "Option Two" \
3 "Option Three" \
- 4 "Exit"
+ 4 "Exit" >/dev/null
-pilotty wait-for -s "$SESSION_NAME" "Choose" -t 5000
+pilotty wait-for -s "$SESSION_NAME" "Choose" -t 5000 >/dev/null
-# Navigate with arrow keys (pilotty auto-detects application cursor mode)
-pilotty key -s "$SESSION_NAME" Down # Move to option 2
-pilotty key -s "$SESSION_NAME" Down # Move to option 3
-pilotty key -s "$SESSION_NAME" Enter # Select
+# Navigate with arrow keys
+pilotty key -s "$SESSION_NAME" Down >/dev/null # Move to option 2
+pilotty key -s "$SESSION_NAME" Down >/dev/null # Move to option 3
+pilotty key -s "$SESSION_NAME" Enter >/dev/null # Select
sleep 0.5
-echo "Selected: Option Three"
+echo "Selected: Option Three (via arrow keys + Enter)"
-# --- Checklist Dialog ---
+# --- Checklist Dialog with Element Detection ---
echo ""
-echo "3. Checklist Dialog"
+echo "3. Checklist Dialog (with element detection)"
pilotty spawn --name "$SESSION_NAME" dialog --checklist "Select items:" 15 50 4 \
1 "Item A" off \
2 "Item B" off \
3 "Item C" off \
- 4 "Item D" off
+ 4 "Item D" off >/dev/null
+
+pilotty wait-for -s "$SESSION_NAME" "Select" -t 5000 >/dev/null
-pilotty wait-for -s "$SESSION_NAME" "Select" -t 5000
+# Show initial toggle states
+echo "Initial toggle states:"
+pilotty snapshot -s "$SESSION_NAME" | jq -r '.elements[] | select(.kind == "toggle") | " \(.text) at (\(.row),\(.col)) checked=\(.checked)"'
# Toggle items with Space
-pilotty key -s "$SESSION_NAME" Space # Toggle Item A
-pilotty key -s "$SESSION_NAME" Down
-pilotty key -s "$SESSION_NAME" Down
-pilotty key -s "$SESSION_NAME" Space # Toggle Item C
-pilotty key -s "$SESSION_NAME" Enter # Confirm
+pilotty key -s "$SESSION_NAME" Space >/dev/null # Toggle Item A
+pilotty key -s "$SESSION_NAME" Down >/dev/null
+pilotty key -s "$SESSION_NAME" Down >/dev/null
+pilotty key -s "$SESSION_NAME" Space >/dev/null # Toggle Item C
+
+# Show updated toggle states
+echo "After toggling:"
+pilotty snapshot -s "$SESSION_NAME" | jq -r '.elements[] | select(.kind == "toggle") | " \(.text) at (\(.row),\(.col)) checked=\(.checked)"'
+
+pilotty key -s "$SESSION_NAME" Enter >/dev/null # Confirm
sleep 0.5
echo "Selected: Item A, Item C"
@@ -83,13 +98,17 @@ echo "Selected: Item A, Item C"
echo ""
echo "4. Input Dialog"
-pilotty spawn --name "$SESSION_NAME" dialog --inputbox "Enter your name:" 10 40
+pilotty spawn --name "$SESSION_NAME" dialog --inputbox "Enter your name:" 10 40 >/dev/null
-pilotty wait-for -s "$SESSION_NAME" "name" -t 5000
+pilotty wait-for -s "$SESSION_NAME" "name" -t 5000 >/dev/null
+
+# Show detected input element
+echo "Detected input element:"
+pilotty snapshot -s "$SESSION_NAME" | jq -r '.elements[] | select(.kind == "input") | " \(.kind) at (\(.row),\(.col)) width=\(.width)"'
# Type input
pilotty type -s "$SESSION_NAME" "Agent Smith"
-pilotty key -s "$SESSION_NAME" Enter
+pilotty key -s "$SESSION_NAME" Enter >/dev/null
sleep 0.5
echo "Entered: Agent Smith"
@@ -98,22 +117,24 @@ echo "Entered: Agent Smith"
echo ""
echo "5. Message Box"
-pilotty spawn --name "$SESSION_NAME" dialog --msgbox "Demo complete!" 10 40
+pilotty spawn --name "$SESSION_NAME" dialog --msgbox "Demo complete!" 10 40 >/dev/null
-pilotty wait-for -s "$SESSION_NAME" "complete" -t 5000
+pilotty wait-for -s "$SESSION_NAME" "complete" -t 5000 >/dev/null
-# Take final snapshot to see the OK button
-pilotty snapshot -s "$SESSION_NAME"
+# Show button element
+echo "Detected button:"
+pilotty snapshot -s "$SESSION_NAME" | jq -r '.elements[] | select(.kind == "button" or .kind == "input") | " \(.kind) \(.text) at (\(.row),\(.col))"'
# Dismiss with Enter
-pilotty key -s "$SESSION_NAME" Enter
+pilotty key -s "$SESSION_NAME" Enter >/dev/null
sleep 0.5
-# Cleanup
-if pilotty list-sessions 2>/dev/null | grep -q "$SESSION_NAME"; then
- pilotty kill -s "$SESSION_NAME"
-fi
-
echo ""
echo "=== Demo Complete ==="
+echo ""
+echo "Key takeaways:"
+echo " - Use snapshot | jq '.elements' to see detected UI elements"
+echo " - Toggles have 'checked' field for state tracking"
+echo " - Use keyboard (Tab, Space, Enter, arrows) for reliable navigation"
+echo " - content_hash can detect screen changes between snapshots"
diff --git a/skills/pilotty/templates/element-detection.sh b/skills/pilotty/templates/element-detection.sh
new file mode 100755
index 0000000..6b2ccb8
--- /dev/null
+++ b/skills/pilotty/templates/element-detection.sh
@@ -0,0 +1,145 @@
+#!/bin/bash
+# Element Detection Template
+# Demonstrates pilotty's element detection and interaction
+#
+# Usage: ./element-detection.sh
+
+set -e
+
+# Configuration
+PILOTTY="${PILOTTY:-pilotty}"
+SESSION="element-demo"
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+BLUE='\033[0;34m'
+YELLOW='\033[1;33m'
+NC='\033[0m'
+
+# Cleanup on exit
+cleanup() {
+ $PILOTTY kill -s "$SESSION" 2>/dev/null || true
+}
+trap cleanup EXIT
+
+echo -e "${BLUE}=== Element Detection Demo ===${NC}"
+echo ""
+
+# -----------------------------------------------------------------------------
+# Step 1: Spawn a TUI with UI elements
+# -----------------------------------------------------------------------------
+echo -e "${YELLOW}Step 1: Spawning dialog checklist...${NC}"
+
+$PILOTTY spawn --name "$SESSION" -- dialog --checklist "Select features to enable:" 15 60 5 \
+ "notifications" "Push notifications" on \
+ "darkmode" "Dark mode theme" off \
+ "autosave" "Auto-save documents" on \
+ "analytics" "Usage analytics" off \
+ "updates" "Auto-updates" on >/dev/null
+
+sleep 0.5
+
+# -----------------------------------------------------------------------------
+# Step 2: Get snapshot with elements
+# -----------------------------------------------------------------------------
+echo -e "${YELLOW}Step 2: Getting snapshot with detected elements...${NC}"
+echo ""
+
+SNAPSHOT=$($PILOTTY snapshot -s "$SESSION")
+
+# Show element summary
+echo -e "${GREEN}Detected elements:${NC}"
+echo "$SNAPSHOT" | jq -r '.elements[] | " \(.kind) \(.text) at (\(.row),\(.col)) conf=\(.confidence)"'
+echo ""
+
+# -----------------------------------------------------------------------------
+# Step 3: Analyze toggles
+# -----------------------------------------------------------------------------
+echo -e "${YELLOW}Step 3: Analyzing toggle states...${NC}"
+echo ""
+
+TOGGLES=$(echo "$SNAPSHOT" | jq '[.elements[] | select(.kind == "toggle")]')
+CHECKED=$(echo "$TOGGLES" | jq '[.[] | select(.checked == true)] | length')
+UNCHECKED=$(echo "$TOGGLES" | jq '[.[] | select(.checked == false)] | length')
+
+echo -e " Checked toggles: ${GREEN}$CHECKED${NC}"
+echo -e " Unchecked toggles: ${RED}$UNCHECKED${NC}"
+echo ""
+
+# Show each toggle
+echo -e "${GREEN}Toggle details:${NC}"
+echo "$TOGGLES" | jq -r '.[] | " \(.text) at (\(.row),\(.col)) checked=\(.checked)"'
+echo ""
+
+# -----------------------------------------------------------------------------
+# Step 4: Toggle an unchecked option
+# -----------------------------------------------------------------------------
+echo -e "${YELLOW}Step 4: Toggling 'darkmode' (currently off)...${NC}"
+
+# Get initial hash for change detection
+HASH1=$(echo "$SNAPSHOT" | jq -r '.content_hash')
+
+# Navigate to darkmode (second option) and toggle
+$PILOTTY key -s "$SESSION" Down >/dev/null # Move to darkmode
+$PILOTTY key -s "$SESSION" Space >/dev/null # Toggle it
+
+sleep 0.2
+
+# Get new snapshot and hash
+SNAPSHOT2=$($PILOTTY snapshot -s "$SESSION")
+HASH2=$(echo "$SNAPSHOT2" | jq -r '.content_hash')
+
+# Verify change
+if [ "$HASH1" != "$HASH2" ]; then
+ echo -e " ${GREEN}Screen changed! (hash: $HASH1 -> $HASH2)${NC}"
+else
+ echo -e " ${RED}No change detected${NC}"
+fi
+echo ""
+
+# Show updated toggle states
+echo -e "${GREEN}Updated toggle states:${NC}"
+echo "$SNAPSHOT2" | jq -r '.elements[] | select(.kind == "toggle") | " \(.text) at (\(.row),\(.col)) checked=\(.checked)"'
+echo ""
+
+# -----------------------------------------------------------------------------
+# Step 5: Find and interact with button
+# -----------------------------------------------------------------------------
+echo -e "${YELLOW}Step 5: Looking for action button...${NC}"
+
+BUTTON=$(echo "$SNAPSHOT2" | jq -r '.elements[] | select(.kind == "button" or .kind == "input") | "\(.text) at (\(.row),\(.col))"' | head -1)
+if [ -n "$BUTTON" ]; then
+ echo -e " Found button: ${GREEN}$BUTTON${NC}"
+else
+ echo -e " ${YELLOW}No button element detected, using keyboard to confirm${NC}"
+fi
+echo ""
+
+# -----------------------------------------------------------------------------
+# Step 6: Confirm selection
+# -----------------------------------------------------------------------------
+echo -e "${YELLOW}Step 6: Confirming selection with Enter...${NC}"
+
+$PILOTTY key -s "$SESSION" Enter >/dev/null
+
+sleep 0.3
+
+# Check final state
+echo -e "${GREEN}Final screen state:${NC}"
+$PILOTTY snapshot -s "$SESSION" --format text 2>/dev/null | head -5 || echo " (dialog closed)"
+echo ""
+
+# -----------------------------------------------------------------------------
+# Summary
+# -----------------------------------------------------------------------------
+echo -e "${BLUE}=== Summary ===${NC}"
+echo ""
+echo "This demo showed how to:"
+echo " 1. Spawn a TUI application"
+echo " 2. Get snapshot with detected elements"
+echo " 3. Analyze element states (toggles, buttons)"
+echo " 4. Use content_hash for change detection"
+echo " 5. Navigate with keyboard based on element context"
+echo ""
+echo -e "${GREEN}Demo complete!${NC}"