diff --git a/.gitignore b/.gitignore
index 246d7df9..600c9ba0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -166,4 +166,6 @@ reports/
/chat
/askui_chat.db
.cache/
+.askui_cache/
+playground/*
diff --git a/docs/caching.md b/docs/caching.md
index d4da680c..df7fae7b 100644
--- a/docs/caching.md
+++ b/docs/caching.md
@@ -1,6 +1,6 @@
# Caching (Experimental)
-**CAUTION: The Caching feature is still in alpha state and subject to change! Use it at your own risk. In case you run into issues, you can disable caching by removing the caching_settings parameter or by explicitly setting the caching_strategy to `no`.**
+**CAUTION: The Caching feature is still in alpha state and subject to change! Use it at your own risk. In case you run into issues, you can disable caching by removing the caching_settings parameter or by explicitly setting the strategy to `None`.**
The caching mechanism allows you to record and replay agent action sequences (trajectories) for faster and more robust test execution. This feature is particularly useful for regression testing, where you want to replay known-good interaction sequences to verify that your application still behaves correctly.
@@ -8,48 +8,93 @@ The caching mechanism allows you to record and replay agent action sequences (tr
The caching system works by recording all tool use actions (mouse movements, clicks, typing, etc.) performed by the agent during an `act()` execution. These recorded sequences can then be replayed in subsequent executions, allowing the agent to skip the decision-making process and execute the actions directly.
+**New in v0.1:** The caching system now includes advanced features like parameter support for dynamic values, smart handling of non-cacheable tools that require agent intervention, comprehensive message history tracking, and automatic failure detection with recovery capabilities.
+
+**New in v0.2:** Visual validation using perceptual hashing ensures cached trajectories execute only when the UI state matches expectations. The settings structure has been refactored for better clarity, separating writing settings from execution settings.
+
## Caching Strategies
+**Updated in v0.2:** Strategy names have been renamed for clarity.
+
The caching mechanism supports four strategies, configured via the `caching_settings` parameter in the `act()` method:
-- **`"no"`** (default): No caching is used. The agent executes normally without recording or replaying actions.
-- **`"write"`**: Records all agent actions to a cache file for future replay.
-- **`"read"`**: Provides tools to the agent to list and execute previously cached trajectories.
-- **`"both"`**: Combines read and write modes - the agent can use existing cached trajectories and will also record new ones.
+- **`None`** (default): No caching is used. The agent executes normally without recording or replaying actions.
+- **`"record"`**: Records all agent actions to a cache file for future replay.
+- **`"execute"`**: Provides tools to the agent to list and execute previously cached trajectories.
+- **`"both"`**: Combines execute and record modes - the agent can use existing cached trajectories and will also record new ones.
## Configuration
+**Updated in v0.2:** Caching settings have been refactored for better clarity, separating writing-related settings from execution-related settings.
+
Caching is configured using the `CachingSettings` class:
```python
-from askui.models.shared.settings import CachingSettings, CachedExecutionToolSettings
+from askui.models.shared.settings import (
+ CachingSettings,
+ CacheWritingSettings,
+ CacheExecutionSettings,
+)
caching_settings = CachingSettings(
- strategy="write", # One of: "read", "write", "both", "no"
+ strategy="both", # One of: "execute", "record", "both", or None
cache_dir=".cache", # Directory to store cache files
- filename="my_test.json", # Filename for the cache file (optional for write mode)
- execute_cached_trajectory_tool_settings=CachedExecutionToolSettings(
- delay_time_between_action=0.5 # Delay in seconds between each cached action
- )
+ writing_settings=CacheWritingSettings(
+ filename="my_test.json", # Cache file name
+ parameter_identification_strategy="llm", # Auto-detect dynamic values
+ visual_verification_method="phash", # Visual validation method
+ visual_validation_region_size=100, # Size of validation region (pixels)
+ visual_validation_threshold=10, # Hamming distance threshold
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5 # Delay in seconds between each action
+ ),
)
```
### Parameters
-- **`strategy`**: The caching strategy to use (`"read"`, `"write"`, `"both"`, or `"no"`).
+- **`strategy`**: The caching strategy to use (`"execute"`, `"record"`, `"both"`, or `None`). **Updated in v0.2:** Renamed from "read"/"write"/"no" to "execute"/"record"/None for clarity.
- **`cache_dir`**: Directory where cache files are stored. Defaults to `".cache"`.
-- **`filename`**: Name of the cache file to write to or read from. If not specified in write mode, a timestamped filename will be generated automatically (format: `cached_trajectory_YYYYMMDDHHMMSSffffff.json`).
-- **`execute_cached_trajectory_tool_settings`**: Configuration for the trajectory execution tool (optional). See [Execution Settings](#execution-settings) below.
+- **`writing_settings`**: **New in v0.2!** Configuration for cache recording. See [Writing Settings](#writing-settings) below. Can be `None` if only executing caches.
+- **`execution_settings`**: **New in v0.2!** Configuration for cache execution. See [Execution Settings](#execution-settings) below. Can be `None` if only recording caches.
+
+### Writing Settings
+
+**New in v0.2!** The `CacheWritingSettings` class configures how cache files are recorded:
+
+```python
+from askui.models.shared.settings import CacheWritingSettings
+
+writing_settings = CacheWritingSettings(
+ filename="my_test.json", # Name of cache file to create
+ parameter_identification_strategy="llm", # "llm" or "preset"
+ visual_verification_method="phash", # "phash", "ahash", or "none"
+ visual_validation_region_size=100, # Size of region to validate (pixels)
+ visual_validation_threshold=10, # Hamming distance threshold (0-64)
+)
+```
+
+#### Parameters
+
+- **`filename`**: Name of the cache file to write. Defaults to `""` (auto-generates timestamped filename: `cached_trajectory_YYYYMMDDHHMMSSffffff.json`).
+- **`parameter_identification_strategy`**: When `"llm"` (default), uses AI to automatically identify and parameterize dynamic values like dates, usernames, and IDs. When `"preset"`, only manually specified parameters using `{{...}}` syntax are detected. See [Automatic Parameter Identification](#automatic-parameter-identification).
+- **`visual_verification_method`**: **New in v0.2!** Visual validation method to use:
+ - `"phash"` (default): Perceptual hash using DCT - robust to minor changes like compression and lighting
+ - `"ahash"`: Average hash - simpler and faster, less robust to transformations
+ - `"none"`: Disable visual validation
+- **`visual_validation_region_size`**: **New in v0.2!** Size of the square region (in pixels) to extract around interaction coordinates for visual validation. Defaults to `100` (100×100 pixel region).
+- **`visual_validation_threshold`**: **New in v0.2!** Maximum Hamming distance (0-64) between stored and current visual hashes to consider a match. Lower values require closer matches. Defaults to `10`.
### Execution Settings
-The `CachedExecutionToolSettings` class allows you to configure how cached trajectories are executed:
+The `CacheExecutionSettings` class configures how cached trajectories are executed:
```python
-from askui.models.shared.settings import CachedExecutionToolSettings
+from askui.models.shared.settings import CacheExecutionSettings
-execution_settings = CachedExecutionToolSettings(
- delay_time_between_action=0.5 # Delay in seconds between each action (default: 0.5)
+execution_settings = CacheExecutionSettings(
+ delay_time_between_action=0.5 # Delay in seconds between each action
)
```
@@ -63,28 +108,35 @@ You can adjust this value based on your application's responsiveness:
## Usage Examples
-### Writing a Cache (Recording)
+### Recording a Cache
Record agent actions to a cache file for later replay:
```python
from askui import VisionAgent
-from askui.models.shared.settings import CachingSettings
+from askui.models.shared.settings import CachingSettings, CacheWritingSettings
with VisionAgent() as agent:
agent.act(
goal="Fill out the login form with username 'admin' and password 'secret123'",
caching_settings=CachingSettings(
- strategy="write",
+ strategy="record",
cache_dir=".cache",
- filename="login_test.json"
+ writing_settings=CacheWritingSettings(
+ filename="login_test.json",
+ visual_verification_method="phash", # Enable visual validation
+ ),
)
)
```
-After execution, a cache file will be created at `.cache/login_test.json` containing all the tool use actions performed by the agent.
+After execution, a cache file will be created at `.cache/login_test.json` containing:
+- All tool use actions performed by the agent
+- Metadata about the execution
+- **New in v0.2:** Visual validation hashes for click and type actions
+- Automatically detected cache parameters (if any)
-### Reading from Cache (Replaying)
+### Executing from Cache
Provide the agent with access to previously recorded trajectories:
@@ -96,22 +148,103 @@ with VisionAgent() as agent:
agent.act(
goal="Fill out the login form",
caching_settings=CachingSettings(
- strategy="read",
+ strategy="execute",
+ cache_dir=".cache"
+ )
+ )
+```
+
+When using `strategy="execute"`, the agent receives two tools:
+
+1. **`RetrieveCachedTestExecutions`**: Lists all available cache files in the cache directory
+2. **`ExecuteCachedTrajectory`**: Executes a cached trajectory. Can start from the beginning (default) or continue from a specific step index using the optional `start_from_step_index` parameter (useful after handling non-cacheable steps)
+
+The agent will automatically check if a relevant cached trajectory exists and use it if appropriate. During execution, the agent can see all screenshots and results in the message history. After executing a cached trajectory, the agent will verify the results and make corrections if needed.
+
+### Using Cache Parameters for Dynamic Values
+
+**New in v0.1:** Trajectories can contain cache_parameters for dynamic values that change between executions:
+
+```python
+from askui import VisionAgent
+from askui.models.shared.settings import CachingSettings
+
+# When recording, use dynamic values as normal
+# The system automatically detects patterns like dates and user-specific data
+with VisionAgent() as agent:
+ agent.act(
+ goal="Create a new task for today with the title 'Review PR'",
+ caching_settings=CachingSettings(
+ strategy="record",
+ cache_dir=".cache",
+ writing_settings=CacheWritingSettings(
+ filename="create_task.json"
+ )
+ )
+ )
+
+# Later, when executing, the agent can provide parameter values
+# If the cache file contains {{current_date}} or {{task_title}}, provide them:
+with VisionAgent() as agent:
+ agent.act(
+ goal="Create a task using the cached flow",
+ caching_settings=CachingSettings(
+ strategy="execute",
+ cache_dir=".cache"
+ )
+ )
+ # The agent will automatically detect required cache_parameters and can provide them
+ # via the parameter_values parameter when calling ExecuteCachedTrajectory
+```
+
+Cache Parameters use the syntax `{{variable_name}}` and are automatically detected during cache file creation. When executing a trajectory with cache_parameters, the agent must provide values for all required cache_parameters.
+
+### Handling Non-Cacheable Steps
+
+**New in v0.1:** Some tools cannot be cached and require the agent to execute them live. Examples include debugging tools, contextual decisions, or tools that depend on runtime state.
+
+```python
+from askui import VisionAgent
+from askui.models.shared.settings import CachingSettings
+
+with VisionAgent() as agent:
+ agent.act(
+ goal="Debug the login form by checking element states",
+ caching_settings=CachingSettings(
+ strategy="execute",
cache_dir=".cache"
)
)
+ # If the cached trajectory contains non-cacheable steps:
+ # 1. Execution pauses when reaching the non-cacheable step
+ # 2. Agent receives NEEDS_AGENT status with current step index
+ # 3. Agent executes the non-cacheable step manually
+ # 4. Agent uses ExecuteCachedTrajectory with start_from_step_index to resume
```
-When using `strategy="read"`, the agent receives two additional tools:
+Tools can be marked as non-cacheable by setting `is_cacheable=False` in their definition. When trajectory execution reaches a non-cacheable tool, it pauses and returns control to the agent for manual execution.
-1. **`retrieve_available_trajectories_tool`**: Lists all available cache files in the cache directory
-2. **`execute_cached_executions_tool`**: Executes a specific cached trajectory
+### Continuing from a Specific Step
-The agent will automatically check if a relevant cached trajectory exists and use it if appropriate. After executing a cached trajectory, the agent will verify the results and make corrections if needed.
+**New in v0.1:** After handling a non-cacheable step or recovering from a failure, the agent can continue execution from a specific step index using the `start_from_step_index` parameter:
+
+```python
+# The agent uses ExecuteCachedTrajectory with start_from_step_index like this:
+result = execute_cached_trajectory_tool(
+ trajectory_file=".cache/my_test.json",
+ start_from_step_index=5, # Continue from step 5
+ parameter_values={"date": "2025-12-11"} # Provide any required cache_parameters
+)
+```
+
+This is particularly useful for:
+- Resuming after manual execution of non-cacheable steps
+- Recovering from partial failures
+- Skipping steps that are no longer needed
### Referencing Cache Files in Goal Prompts
-When using `strategy="read"` or `strategy="both"`, you need to inform the agent about which cache files are available and when to use them. This is done by including cache file information directly in your goal prompt.
+When using `strategy="execute"` or `strategy="both"`, you need to inform the agent about which cache files are available and when to use them. This is done by including cache file information directly in your goal prompt.
#### Explicit Cache File References
@@ -124,11 +257,11 @@ from askui.models.shared.settings import CachingSettings
with VisionAgent() as agent:
agent.act(
goal="""Open the website in Google Chrome.
-
- If the cache file "open_website_in_chrome.json" is available, please use it
+
+ If the cache file "open_website_in_chrome.json" is available, please use it
for this execution. It will open a new window in Chrome and navigate to the website.""",
caching_settings=CachingSettings(
- strategy="read",
+ strategy="execute",
cache_dir=".cache"
)
)
@@ -147,11 +280,11 @@ test_id = "TEST_001"
with VisionAgent() as agent:
agent.act(
goal=f"""Execute test {test_id} according to the test definition.
-
- Check if a cache file named "{test_id}.json" exists. If it does, use it to
+
+ Check if a cache file named "{test_id}.json" exists. If it does, use it to
replay the test actions, then verify the results.""",
caching_settings=CachingSettings(
- strategy="read",
+ strategy="execute",
cache_dir="test_cache"
)
)
@@ -168,12 +301,12 @@ from askui.models.shared.settings import CachingSettings
with VisionAgent() as agent:
agent.act(
goal="""Fill out the user registration form.
-
- Look for cache files that match the pattern "user_registration_*.json".
- Choose the most recent one if multiple are available, as it likely contains
+
+ Look for cache files that match the pattern "user_registration_*.json".
+ Choose the most recent one if multiple are available, as it likely contains
the most up-to-date interaction sequence.""",
caching_settings=CachingSettings(
- strategy="read",
+ strategy="execute",
cache_dir=".cache"
)
)
@@ -190,14 +323,14 @@ from askui.models.shared.settings import CachingSettings
with VisionAgent() as agent:
agent.act(
goal="""Complete the full checkout process:
-
+
1. If "login.json" exists, use it to log in
2. If "add_to_cart.json" exists, use it to add items to cart
3. If "checkout.json" exists, use it to complete the checkout
-
+
After each cached execution, verify the step completed successfully before proceeding.""",
caching_settings=CachingSettings(
- strategy="read",
+ strategy="execute",
cache_dir=".cache"
)
)
@@ -215,15 +348,15 @@ You can customize the delay between cached actions to match your application's r
```python
from askui import VisionAgent
-from askui.models.shared.settings import CachingSettings, CachedExecutionToolSettings
+from askui.models.shared.settings import CachingSettings, CacheExecutionSettings
with VisionAgent() as agent:
agent.act(
goal="Fill out the login form",
caching_settings=CachingSettings(
- strategy="read",
+ strategy="execute",
cache_dir=".cache",
- execute_cached_trajectory_tool_settings=CachedExecutionToolSettings(
+ execution_settings=CacheExecutionSettings(
delay_time_between_action=1.0 # Wait 1 second between each action
)
)
@@ -261,108 +394,1018 @@ In this mode:
## Cache File Format
-Cache files are JSON files containing an array of tool use blocks. Each block represents a single tool invocation with the following structure:
+**New in v0.1:** Cache files now use an enhanced format with metadata tracking, parameter support, and execution history.
+
+**New in v0.2:** Cache files include visual validation metadata and enhanced trajectory steps with visual hashes.
+
+### v0.2 Format (Current)
+
+Cache files are JSON objects with the following structure:
```json
-[
+{
+ "metadata": {
+ "version": "0.2",
+ "created_at": "2025-12-30T10:30:00Z",
+ "goal": "Greet user {{user_name}} and log them in",
+ "last_executed_at": "2025-12-30T15:45:00Z",
+ "token_usage": {
+ "input_tokens": 1250,
+ "output_tokens": 380,
+ "cache_creation_input_tokens": 0,
+ "cache_read_input_tokens": 0
+ },
+ "execution_attempts": 3,
+ "failures": [
+ {
+ "timestamp": "2025-12-30T14:20:00Z",
+ "step_index": 5,
+ "error_message": "Visual validation failed: UI region changed",
+ "failure_count_at_step": 1
+ }
+ ],
+ "is_valid": true,
+ "invalidation_reason": null,
+ "visual_verification_method": "phash",
+ "visual_validation_region_size": 100,
+ "visual_validation_threshold": 10
+ },
+ "trajectory": [
{
- "type": "tool_use",
- "id": "toolu_01AbCdEfGhIjKlMnOpQrStUv",
- "name": "computer",
- "input": {
- "action": "mouse_move",
- "coordinate": [150, 200]
- }
+ "type": "tool_use",
+ "id": "toolu_01AbCdEfGhIjKlMnOpQrStUv",
+ "name": "computer",
+ "input": {
+ "action": "left_click",
+ "coordinate": [450, 320]
+ },
+ "visual_representation": "80c0e3f3e3e7e381c7c78f1f3f3f7f7e"
},
{
- "type": "tool_use",
- "id": "toolu_02AbCdEfGhIjKlMnOpQrStUv",
- "name": "computer",
- "input": {
- "action": "left_click"
- }
+ "type": "tool_use",
+ "id": "toolu_02XyZaBcDeFgHiJkLmNoPqRs",
+ "name": "computer",
+ "input": {
+ "action": "type",
+ "text": "Hello {{user_name}}!"
+ },
+ "visual_representation": "91d1f4e4d4c6c282c6c79e2e4e4e6e6d"
},
{
- "type": "tool_use",
- "id": "toolu_03AbCdEfGhIjKlMnOpQrStUv",
- "name": "computer",
- "input": {
- "action": "type",
- "text": "admin"
- }
+ "type": "tool_use",
+ "id": "toolu_03StUvWxYzAbCdEfGhIjKlMn",
+ "name": "print_debug_info",
+ "input": {},
+ "visual_representation": null
}
-]
+ ],
+ "cache_parameters": {
+ "user_name": "Name of the user to greet"
+ }
+}
```
-Note: Screenshot actions are excluded from cached trajectories as they don't modify the UI state.
+**Note:** In the example above, `print_debug_info` is marked as non-cacheable (`is_cacheable=False`), so its `input` field is blank (`{}`). This saves space and privacy since non-cacheable tools aren't executed from cache anyway.
+
+#### Metadata Fields
+
+- **`version`**: Cache file format version (currently "0.2")
+- **`created_at`**: ISO 8601 timestamp when the cache was created
+- **`goal`**: The original goal/instruction given to the agent when recording this trajectory. Cache Parameters are applied to the goal text just like in the trajectory, making it easy to understand what the cache was designed to accomplish.
+- **`last_executed_at`**: ISO 8601 timestamp of the last execution (null if never executed)
+- **`token_usage`**: **New in v0.1!** Token usage statistics from the recording execution
+- **`execution_attempts`**: Number of times this trajectory has been executed
+- **`failures`**: List of failures encountered during execution (see [Failure Tracking](#failure-tracking))
+- **`is_valid`**: Boolean indicating if the cache is still considered valid
+- **`invalidation_reason`**: Optional string explaining why the cache was invalidated
+- **`visual_verification_method`**: **New in v0.2!** Visual validation method used when recording (`"phash"`, `"ahash"`, or `null`)
+- **`visual_validation_region_size`**: **New in v0.2!** Size of the validation region in pixels (e.g., `100` for 100×100 pixels)
+- **`visual_validation_threshold`**: **New in v0.2!** Hamming distance threshold for visual validation (0-64)
+
+#### Cache Parameters
+
+The `cache_parameters` object maps parameter names to their descriptions. Cache Parameters in the trajectory use the syntax `{{parameter_name}}` and must be substituted with actual values during execution.
+
+#### Failure Tracking
+
+Each failure record contains:
+- **`timestamp`**: When the failure occurred
+- **`step_index`**: Which step failed (0-indexed)
+- **`error_message`**: The error that occurred
+- **`failure_count_at_step`**: How many times this specific step has failed
+
+This information helps with cache invalidation decisions and debugging.
## How It Works
-### Write Mode
+### Internal Architecture
-In write mode, the `CacheWriter` class:
+The caching system consists of several key components:
+
+- **`CacheWriter`**: Handles recording trajectories in record mode
+- **`CacheExecutionManager`**: Manages cache execution state, flow control, and metadata updates during trajectory replay
+- **`TrajectoryExecutor`**: Executes individual steps from cached trajectories
+- **Agent**: Orchestrates the conversation flow and delegates cache execution to `CacheExecutionManager`
+
+When executing a cached trajectory, the `Agent` class delegates all cache-related logic to `CacheExecutionManager`, which handles:
+- State management (execution mode, verification pending, etc.)
+- Execution flow control (success, failure, needs agent, completed)
+- Message history building and injection
+- Metadata updates (execution attempts, failures, invalidation)
+
+This separation of concerns keeps the Agent focused on conversation orchestration while CacheExecutionManager handles all caching complexity.
+
+### Record Mode
+
+In record mode, the `CacheWriter` class:
1. Intercepts all assistant messages via a callback function
2. Extracts tool use blocks from the messages
-3. Stores them in memory during execution
-4. Writes them to a JSON file when the agent finishes (on `stop_reason="end_turn"`)
-5. Automatically skips writing if a cached execution was used (to avoid recording replays)
+3. **Enhances with visual validation** (New in v0.2):
+ - For click and type actions, captures screenshot before execution
+ - Extracts region around interaction coordinate
+ - Computes perceptual hash using selected method (pHash/aHash)
+ - Attaches hash and validation settings to tool block
+4. Stores enhanced tool blocks in memory during execution
+5. When agent finishes (on `stop_reason="end_turn"`):
+ - **Automatically identifies cache_parameters** using AI (if `parameter_identification_strategy=llm`)
+ - Analyzes trajectory to find dynamic values (dates, usernames, IDs, etc.)
+ - Generates descriptive parameter definitions
+ - Replaces identified values with `{{parameter_name}}` syntax in trajectory
+ - Applies same replacements to the goal text
+ - **Blanks non-cacheable tool inputs** by setting `input: {}` for tools with `is_cacheable=False` (saves space and privacy)
+ - **Writes to JSON file** with:
+ - v0.2 metadata (version, timestamps, goal, token usage, visual validation settings)
+ - Trajectory of tool use blocks (with cache_parameters, visual hashes, and blanked inputs)
+ - Parameter definitions with descriptions
+6. Automatically skips writing if a cached execution was used (to avoid recording replays)
+
+### Execute Mode
+
+In execute mode:
+
+1. Two caching tools are added to the agent's toolbox:
+ - `RetrieveCachedTestExecutions`: Lists available trajectories
+ - `ExecuteCachedTrajectory`: Executes from the beginning or continues from a specific step using `start_from_step_index`
+2. A special system prompt (`CACHE_USE_PROMPT`) instructs the agent on:
+ - How to use trajectories
+ - Parameter handling
+ - Non-cacheable step management
+ - Failure recovery strategies
+3. The agent can list available cache files and choose appropriate ones
+4. During execution via `TrajectoryExecutor`:
+ - **Visual validation** (New in v0.2): Before each validated step, captures current UI and compares hash to stored hash
+ - Each step is executed sequentially with configurable delays
+ - All tools in the trajectory are executed, including screenshots and retrieval tools
+ - Non-cacheable tools trigger a pause with `NEEDS_AGENT` status
+ - Cache Parameters are validated and substituted before execution
+ - Message history is built with assistant (tool use) and user (tool result) messages
+ - Agent sees all screenshots and results in the message history
+5. Execution can pause for agent intervention:
+ - When visual validation fails (New in v0.2)
+ - When reaching non-cacheable tools
+ - When errors occur (with failure details)
+6. Agent can resume execution:
+ - Using `ExecuteCachedTrajectory` with `start_from_step_index` from the pause point
+ - Skipping failed or irrelevant steps
+7. Results are verified by the agent, with corrections made as needed
+
+### Message History
+
+**New in v0.1:** During cached trajectory execution, a complete message history is built and returned to the agent. This includes:
+
+- **Assistant messages**: Containing `ToolUseBlockParam` for each action
+- **User messages**: Containing `ToolResultBlockParam` with:
+ - Text results from tool execution
+ - Screenshots (when available)
+ - Error messages (on failure)
+
+This visibility allows the agent to:
+- See the current UI state via screenshots
+- Understand what actions were taken
+- Detect when execution has diverged from expectations
+- Make informed decisions about corrections or retries
+
+### Non-Cacheable Tools
+
+Tools can be marked as non-cacheable by setting `is_cacheable=False` in their definition:
+
+```python
+from askui.models.shared.tools import Tool
+
+class DebugPrintTool(Tool):
+ name = "print_debug"
+ description = "Print debug information about current state"
+ is_cacheable = False # This tool requires agent context
+
+ def __call__(self, message: str) -> str:
+ # Tool implementation...
+ pass
+```
+
+During trajectory execution, when a non-cacheable tool is encountered:
+
+1. `TrajectoryExecutor` pauses execution
+2. Returns `ExecutionResult` with status `NEEDS_AGENT`
+3. Includes current step index and message history
+4. Agent receives control to execute the step manually
+5. Agent uses `ExecuteCachedTrajectory` with `start_from_step_index` to resume from next step
+
+This mechanism is essential for tools that:
+- Require runtime context (debugging, inspection)
+- Make decisions based on current state
+- Have side effects that shouldn't be blindly replayed
+- Depend on external systems that may have changed
+
+## Failure Handling
+
+**New in v0.1:** Enhanced failure handling provides the agent with detailed information about what went wrong and where.
+
+### When Execution Fails
+
+If a step fails during trajectory execution:
+
+1. Execution stops at the failed step
+2. `ExecutionResult` includes:
+ - Status: `FAILED`
+ - `step_index`: Which step failed
+ - `error_message`: The specific error
+ - `message_history`: All actions and results up to the failure
+3. Failure is recorded in cache metadata for tracking
+4. Agent receives the failure information and can decide:
+ - **Retry**: Execute remaining steps manually
+ - **Resume**: Fix the issue and use `ExecuteCachedTrajectory` with `start_from_step_index` from next step
+ - **Abort**: Report that cache needs re-recording
+
+### Failure Tracking
+
+Cache metadata tracks all failures:
+```json
+"failures": [
+ {
+ "timestamp": "2025-12-11T14:20:00Z",
+ "step_index": 5,
+ "error_message": "Element not found: login button",
+ "failure_count_at_step": 2
+ }
+]
+```
+
+This information enables:
+- Smart cache invalidation (too many failures → invalid cache)
+- Debugging (which steps are problematic)
+- Metrics (cache reliability over time)
+- Auto-recovery strategies (skip commonly failing steps)
+
+### Agent Recovery Options
+
+The agent has several recovery strategies:
+
+1. **Manual Execution**: Execute remaining steps without cache
+2. **Partial Resume**: Fix the issue (e.g., wait for element) then continue from next step
+3. **Skip and Continue**: Skip the failed step and continue from a later step
+4. **Report Invalid**: Mark the cache as outdated and request re-recording
+
+Example agent decision flow:
+```
+Trajectory fails at step 5: "Element not found: submit button"
+↓
+Agent takes screenshot to assess current state
+↓
+Agent sees submit button is present but has different text
+↓
+Agent clicks the button manually
+↓
+Agent calls ExecuteCachedTrajectory(start_from_step_index=6)
+↓
+Execution continues successfully
+```
+
+## Cache Parameters
+
+**New in v0.1:** Cache Parameters enable dynamic value substitution in cached trajectories.
+
+### Parameter Syntax
+
+Cache Parameters use double curly braces: `{{parameter_name}}`
+
+Valid parameter names:
+- Must start with a letter or underscore
+- Can contain letters, numbers, and underscores
+- Examples: `{{date}}`, `{{user_name}}`, `{{order_id_123}}`
+
+### Automatic Cache Parameter Identification
+
+**New in v0.1!** The caching system uses AI to automatically identify and parameterize dynamic values when recording trajectories.
+
+#### How It Works
+
+When `parameter_identification_strategy=llm` (the default), the system:
+
+1. **Records the trajectory** as normal during agent execution
+2. **Analyzes the trajectory** using an LLM to identify dynamic values such as:
+ - Dates and timestamps (e.g., "2025-12-11", "10:30 AM")
+ - Usernames, emails, names (e.g., "john.doe", "test@example.com")
+ - Session IDs, tokens, UUIDs, API keys
+ - Dynamic text referencing current state or time
+ - File paths with user-specific or time-specific components
+ - Temporary or generated identifiers
+3. **Generates parameter definitions** with descriptive names and documentation:
+ ```json
+ {
+ "name": "current_date",
+ "value": "2025-12-11",
+ "description": "Current date in YYYY-MM-DD format"
+ }
+ ```
+4. **Replaces values with cache_parameters** in both the trajectory AND the goal:
+ - Original: `"text": "Login as john.doe"`
+ - Result: `"text": "Login as {{username}}"`
+5. **Saves the templated trajectory** to the cache file
+
+#### Benefits
+
+✅ **No manual work** - Automatically identifies dynamic values
+✅ **Smart detection** - LLM understands semantic meaning (dates vs coordinates)
+✅ **Descriptive** - Generates helpful descriptions for each parameter
+✅ **Applies to goal** - Goal text also gets parameter replacement
+
+#### What Gets Detected
+
+The AI identifies values that are likely to change between executions:
+
+**Will be detected as cache_parameters:**
+- Dates: "2025-12-11", "Dec 11, 2025", "12/11/2025"
+- Times: "10:30 AM", "14:45:00", "2025-12-11T10:30:00Z"
+- Usernames: "john.doe", "admin_user", "test_account"
+- Emails: "user@example.com", "test@domain.org"
+- IDs: "uuid-1234-5678", "session_abc123", "order_9876"
+- Names: "John Smith", "Jane Doe"
+- Dynamic text: "Today is 2025-12-11", "Logged in as john.doe"
+
+**Will NOT be detected as cache_parameters:**
+- UI coordinates: `{"x": 100, "y": 200}`
+- Fixed button labels: "Submit", "Cancel", "OK"
+- Configuration values: `{"timeout": 30, "retries": 3}`
+- Generic actions: "click", "type", "scroll"
+- Boolean values: `true`, `false`
+
+#### Disabling Auto-Identification
+
+If you prefer manual parameter control:
+
+```python
+caching_settings = CachingSettings(
+ strategy="record",
+ writing_settings=CacheWritingSettings(
+ parameter_identification_strategy="preset" # Only detect {{...}} syntax
+ )
+)
+```
+
+With `parameter_identification_strategy="preset"`, only manually specified cache_parameters using the `{{...}}` syntax will be detected.
+
+#### Logging
+
+To see what cache_parameters are being identified, enable INFO-level logging:
+
+```python
+import logging
+logging.basicConfig(level=logging.INFO)
+```
+
+You'll see output like:
+```
+INFO: Using LLM to identify cache_parameters in trajectory
+INFO: Identified 3 cache_parameters in trajectory
+DEBUG: - current_date: 2025-12-11 (Current date in YYYY-MM-DD format)
+DEBUG: - username: john.doe (Username for login)
+DEBUG: - session_id: abc123 (Session identifier)
+INFO: Replaced 3 parameter values in trajectory
+INFO: Applied parameter replacement to goal: Login as john.doe -> Login as {{username}}
+```
+
+### Manual Cache Parameters
+
+You can also manually create cache_parameters when recording by using the syntax in your goal description. The system will preserve `{{...}}` patterns in tool inputs.
+
+### Providing Parameter Values
+
+When executing a trajectory with cache_parameters, the agent must provide values:
+
+```python
+# Via ExecuteCachedTrajectory
+result = execute_cached_trajectory_tool(
+ trajectory_file=".cache/my_test.json",
+ parameter_values={
+ "current_date": "2025-12-11",
+ "user_email": "test@example.com"
+ }
+)
+
+# Via ExecuteCachedTrajectory with start_from_step_index
+result = execute_cached_trajectory_tool(
+ trajectory_file=".cache/my_test.json",
+ start_from_step_index=3, # Continue from step 3
+ parameter_values={
+ "current_date": "2025-12-11",
+ "user_email": "test@example.com"
+ }
+)
+```
+
+### Parameter Validation
+
+Before execution, the system validates that:
+- All required cache_parameters have values provided
+- No required cache_parameters are missing
-### Read Mode
+If validation fails, execution is aborted with a clear error message listing missing cache_parameters.
-In read mode:
+### Use Cases
-1. Two caching tools are added to the agent's toolbox
-2. A special system prompt (`CACHE_USE_PROMPT`) is appended to instruct the agent on how to use trajectories
-3. The agent can call `retrieve_available_trajectories_tool` to see available cache files
-4. The agent can call `execute_cached_executions_tool` with a trajectory file path to replay it
-5. During replay, each tool use block is executed sequentially with a configurable delay between actions (default: 0.5 seconds)
-6. Screenshot and trajectory retrieval tools are skipped during replay
-7. The agent is instructed to verify results after replay and make corrections if needed
+Cache Parameters are particularly useful for:
+- **Date-dependent workflows**: Testing with current/future dates
+- **User-specific actions**: Different users, emails, names
+- **Order/transaction IDs**: Testing with different identifiers
+- **Environment-specific values**: API endpoints, credentials
+- **Parameterized testing**: Running same flow with different data
-The delay between actions can be customized using `CachedExecutionToolSettings` to accommodate different application response times.
+Example:
+```json
+{
+ "name": "computer",
+ "input": {
+ "action": "type",
+ "text": "Schedule meeting for {{meeting_date}} with {{attendee_email}}"
+ }
+}
+```
+
+## Visual Validation
+
+**New in v0.2!** Visual validation ensures cached trajectories execute only when the UI state matches the recorded state, preventing actions from being executed on incorrect UI elements.
+
+### How It Works
+
+During cache recording (record mode), the system:
+1. **Captures screenshots** before each interaction (clicks, typing, key presses)
+2. **Extracts a region** (e.g., 100×100 pixels) around the interaction coordinate
+3. **Computes a perceptual hash** of that region using the selected method
+4. **Stores the hash** in the trajectory step along with validation settings
+
+During cache execution (execute mode), the system:
+1. **Captures the current UI state** before each step
+2. **Extracts the same region** around the interaction coordinate
+3. **Computes the hash** of the current region
+4. **Compares hashes** using Hamming distance
+5. **Validates the match** against the threshold
+6. **Executes the step** only if validation passes, otherwise returns control to the agent
+
+### Visual Validation Methods
+
+#### pHash (Perceptual Hash)
+
+Default method using Discrete Cosine Transform (DCT):
+
+```python
+writing_settings=CacheWritingSettings(
+ visual_verification_method="phash", # Default
+ visual_validation_region_size=100,
+ visual_validation_threshold=10,
+)
+```
+
+**Characteristics:**
+- ✅ Robust to minor changes (compression, scaling, lighting adjustments)
+- ✅ Sensitive to structural changes (moved buttons, different layouts)
+- ✅ Best for most use cases
+- ⚠️ Slightly slower than aHash
+
+**When to use:**
+- Production environments where UI may have subtle variations
+- Cross-platform testing (different rendering engines)
+- Long-lived caches that may encounter minor UI updates
+
+#### aHash (Average Hash)
+
+Simpler method using mean pixel values:
+
+```python
+writing_settings=CacheWritingSettings(
+ visual_verification_method="ahash",
+ visual_validation_region_size=100,
+ visual_validation_threshold=10,
+)
+```
+
+**Characteristics:**
+- ✅ Fast computation
+- ✅ Simple and predictable
+- ⚠️ Less robust to transformations
+- ⚠️ More sensitive to color/brightness changes
+
+**When to use:**
+- Development/testing environments with controlled conditions
+- Performance-critical scenarios
+- UI that rarely changes
+
+#### Disabled
+
+Disable visual validation entirely:
+
+```python
+writing_settings=CacheWritingSettings(
+ visual_verification_method="none",
+)
+```
+
+**When to use:**
+- UI that never changes
+- Testing the caching system itself
+- Debugging trajectory execution
+
+### Configuration Options
+
+#### Region Size
+
+The `visual_validation_region_size` parameter controls the size of the square region extracted around each interaction coordinate:
+
+```python
+writing_settings=CacheWritingSettings(
+ visual_validation_region_size=50, # 50×50 pixel region (smaller, faster)
+ # visual_validation_region_size=100, # 100×100 pixel region (default, balanced)
+ # visual_validation_region_size=200, # 200×200 pixel region (larger, more context)
+)
+```
+
+**Smaller regions (50-75 pixels):**
+- ✅ Faster processing
+- ✅ More focused validation (just the element)
+- ⚠️ May miss context changes
-## Limitations
+**Larger regions (150-200 pixels):**
+- ✅ Captures more UI context
+- ✅ Detects broader layout changes
+- ⚠️ Slower processing
+- ⚠️ More sensitive to unrelated UI changes
-- **UI State Sensitivity**: Cached trajectories assume the UI is in the same state as when they were recorded. If the UI has changed, the replay may fail or produce incorrect results.
-- **No on_message Callback**: When using `strategy="write"` or `strategy="both"`, you cannot provide a custom `on_message` callback, as the caching system uses this callback to record actions.
+**Default (100 pixels):**
+- Balanced between speed and context
+- Suitable for most use cases
+
+#### Validation Threshold
+
+The `visual_validation_threshold` parameter controls how similar the UI must be (Hamming distance, 0-64):
+
+```python
+writing_settings=CacheWritingSettings(
+ visual_validation_threshold=5, # Strict: requires very close match
+ # visual_validation_threshold=10, # Default: balanced
+ # visual_validation_threshold=20, # Lenient: allows more variation
+)
+```
+
+**Lower thresholds (0-5):**
+- Very strict matching
+- Fails on minor UI changes
+- Best for pixel-perfect UIs
+
+**Medium thresholds (8-15):**
+- Balanced sensitivity
+- Tolerates minor variations
+- **Default: 10**
+
+**Higher thresholds (20-30):**
+- Lenient matching
+- May allow too much variation
+- Risk of false positives
+
+### Validated Actions
+
+Visual validation is applied to actions that interact with specific UI coordinates:
+
+**Validated automatically:**
+- `left_click`
+- `right_click`
+- `double_click`
+- `middle_click`
+- `type` (validates input field location)
+- `key` (validates focus location)
+
+**NOT validated:**
+- `mouse_move` (movement doesn't require validation)
+- `screenshot` (no UI interaction)
+- Non-computer tools
+- Tools marked as `is_cacheable=False`
+
+### Handling Validation Failures
+
+When visual validation fails during cache execution:
+
+1. **Execution stops** at the failed step
+2. **Agent receives notification** with details:
+ - Which step failed
+ - The validation error message
+ - Current message history and screenshots
+3. **Agent can decide**:
+ - Take a screenshot to assess current UI state
+ - Execute the step manually if safe
+ - Skip the step and continue
+ - Invalidate the cache and request re-recording
+
+Example agent recovery flow:
+```
+Step 5 validation fails: "Visual validation failed: UI region changed (distance: 15 > threshold: 10)"
+↓
+Agent takes screenshot to see current state
+↓
+Agent sees button is present but slightly moved
+↓
+Agent clicks button manually at new location
+↓
+Agent continues execution from step 6
+```
+
+### Best Practices
+
+1. **Choose the right method:**
+ - Use `phash` (default) for most cases
+ - Use `ahash` only for controlled environments
+ - Never use `none` in production
+
+2. **Tune the threshold:**
+ - Start with default (10)
+ - Increase if getting too many false failures
+ - Decrease if allowing incorrect executions
+
+3. **Adjust region size:**
+ - Use default (100) initially
+ - Increase for complex layouts
+ - Decrease for simple, isolated elements
+
+4. **Monitor validation logs:**
+ - Enable INFO logging to see validation results
+ - Track failure patterns
+ - Adjust settings based on failure analysis
+
+5. **Re-record when needed:**
+ - After significant UI changes
+ - When validation consistently fails
+ - After threshold/region adjustments
+
+### Logging
+
+Enable INFO-level logging to see visual validation activity:
+
+```python
+import logging
+logging.basicConfig(level=logging.INFO)
+```
+
+During **recording**, you'll see:
+```
+INFO: ✓ Visual validation added to computer action=left_click at coordinate (450, 320) (hash=80c0e3f3e3e7e381...)
+INFO: ✓ Visual validation added to computer action=type at coordinate (450, 380) (hash=91d1f4e4d4c6c282...)
+```
+
+During **execution**, validation happens silently on success. On **failure**, you'll see:
+```
+WARNING: Visual validation failed at step 5: Visual validation failed: UI region changed significantly (Hamming distance: 15 > threshold: 10)
+WARNING: Handing execution back to agent.
+```
+
+## Limitations and Considerations
+
+### Current Limitations
+
+- **UI State Sensitivity**: Cached trajectories assume the UI is in the same state as when they were recorded. If the UI has changed significantly, replay may fail.
+- **No on_message Callback**: When using `strategy="record"` or `strategy="both"`, you cannot provide a custom `on_message` callback, as the caching system uses this callback to record actions.
- **Verification Required**: After executing a cached trajectory, the agent should verify that the results are correct, as UI changes may cause partial failures.
+### Best Practices
+
+1. **Always Verify Results**: After cached execution, verify the outcome matches expectations
+2. **Handle Failures Gracefully**: Provide clear recovery paths when trajectories fail
+3. **Use Cache Parameters Wisely**: Identify dynamic values that should be parameterized
+4. **Mark Non-Cacheable Tools**: Properly mark tools that require agent intervention
+5. **Monitor Cache Validity**: Track execution attempts and failures to identify stale caches
+6. **Test Cache Replay**: Periodically test that cached trajectories still work
+7. **Version Your Caches**: Use descriptive filenames or directories for different app versions
+8. **Adjust Delays**: Tune `delay_time_between_action` based on your app's responsiveness
+
+### When to Re-Record
+
+Consider re-recording a cached trajectory when:
+- UI layout or element positions have changed significantly
+- Workflow steps have been added, removed, or reordered
+- Failures occur consistently at the same steps
+- Execution takes significantly longer than expected
+- The cache has been marked invalid due to failure patterns
+
+## Migration from v0.1 to v0.2
+
+v0.2 introduces visual validation and refactored settings structure. Here's what you need to know to migrate from v0.1.
+
+### What Changed in v0.2
+
+**Functional Changes:**
+- **Visual Validation**: Cache recording now captures visual hashes (pHash/aHash) of UI regions around each click/type action. During execution, these hashes are validated to ensure the UI state matches expectations before executing cached actions.
+- **Smarter Cache Execution**: Visual validation helps detect when the UI has changed, preventing cached actions from executing on wrong elements.
+
+**API Changes:**
+- **Strategy names renamed** for clarity:
+ - `"read"` → `"execute"`
+ - `"write"` → `"record"`
+ - `"no"` → `None`
+- **Settings refactored** into separate writing and execution settings:
+ - `CacheWritingSettings` for recording-related configuration
+ - `CacheExecutionSettings` for playback-related configuration
+- **Default cache directory** changed from `".cache"` to `".askui_cache"`
+
+### Step 1: Update Your Code
+
+**Old v0.1 code:**
+```python
+from askui.models.shared.settings import CachingSettings
+
+caching_settings = CachingSettings(
+ caching_strategy="read", # Old naming
+ cache_dir=".cache",
+ file_name="my_test.json",
+ delay_time_between_action=0.5,
+)
+```
+
+**New v0.2 code:**
+```python
+from askui.models.shared.settings import (
+ CachingSettings,
+ CacheWritingSettings,
+ CacheExecutionSettings,
+)
+
+caching_settings = CachingSettings(
+ strategy="execute", # New naming (was "read")
+ cache_dir=".askui_cache", # New default directory
+ writing_settings=CacheWritingSettings(
+ filename="my_test.json",
+ visual_verification_method="phash", # New in v0.2
+ visual_validation_region_size=100, # New in v0.2
+ visual_validation_threshold=10, # New in v0.2
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+)
+```
+
+**Migration checklist:**
+- [ ] Replace `caching_strategy` with `strategy`
+- [ ] Rename `"read"` → `"execute"`, `"write"` → `"record"`, `"no"` → `None`
+- [ ] Move `file_name` → `writing_settings.filename`
+- [ ] Move `delay_time_between_action` → `execution_settings.delay_time_between_action`
+- [ ] Add `writing_settings=CacheWritingSettings(...)` if using record mode
+- [ ] Add `execution_settings=CacheExecutionSettings(...)` if using execute mode
+- [ ] Update `cache_dir` from `".cache"` to `".askui_cache"` (optional but recommended)
+
+### Step 2: Handle Existing Cache Files
+
+**Important:** v0.1 cache files do NOT work with v0.2 due to visual validation changes.
+
+You have two options:
+
+#### Option A: Delete Old Cache Files (Recommended)
+
+The simplest approach is to delete all v0.1 cache files and re-record them with v0.2:
+
+```bash
+# Delete all old cache files
+rm -rf .cache/*.json
+
+# Or if you updated to new directory:
+rm -rf .askui_cache/*.json
+```
+
+Then re-run your workflows in `record` mode to create new v0.2 cache files with visual validation.
+
+#### Option B: Disable Visual Validation (Not Recommended)
+
+If you must use old cache files temporarily, you can disable visual validation:
+
+```python
+writing_settings=CacheWritingSettings(
+ filename="old_cache.json",
+ visual_verification_method="none", # Disable visual validation
+)
+```
+
+**Warning:** Without visual validation, cached actions may execute on wrong UI elements if the interface has changed. This defeats the primary benefit of v0.2.
+
+### Step 3: Verify Migration
+
+After updating your code and cache files:
+
+1. **Test record mode**: Verify new cache files are created with visual validation
+ ```bash
+ # Check for visual_representation fields in cache file
+ cat .askui_cache/my_test.json | grep -A2 visual_representation
+ ```
+
+2. **Test execute mode**: Verify cached trajectories execute with visual validation
+ - You should see log messages about visual validation during execution
+ - If UI has changed, execution should fail with visual validation errors
+
+3. **Check metadata**: Verify cache files contain v0.2 metadata
+ ```bash
+ cat .askui_cache/my_test.json | grep -A5 '"metadata"'
+ ```
+
+ Should include:
+ ```json
+ "visual_verification_method": "phash",
+ "visual_validation_region_size": 100,
+ "visual_validation_threshold": 10
+ ```
+
+### Example: Complete Migration
+
+**Before (v0.1):**
+```python
+caching_settings = CachingSettings(
+ caching_strategy="both",
+ cache_dir=".cache",
+ file_name="login_test.json",
+ delay_time_between_action=0.3,
+)
+```
+
+**After (v0.2):**
+```python
+caching_settings = CachingSettings(
+ strategy="both", # Renamed from caching_strategy
+ cache_dir=".askui_cache", # New default
+ writing_settings=CacheWritingSettings(
+ filename="login_test.json", # Moved from file_name
+ visual_verification_method="phash", # New: visual validation
+ visual_validation_region_size=100, # New: validation region
+ visual_validation_threshold=10, # New: strictness level
+ parameter_identification_strategy="llm",
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.3, # Moved to execution_settings
+ ),
+)
+```
+
+### Troubleshooting
+
+**Issue**: Cache execution fails with "Visual validation failed"
+- **Cause**: UI has changed since cache was recorded
+- **Solution**: Re-record the cache or adjust `visual_validation_threshold` (higher = more tolerant)
+
+**Issue**: Import error for `CacheWritingSettings`
+- **Cause**: Old import statement
+- **Solution**: Update imports:
+ ```python
+ from askui.models.shared.settings import (
+ CachingSettings,
+ CacheWritingSettings,
+ CacheExecutionSettings,
+ )
+ ```
+
+**Issue**: Old cache files don't work
+- **Cause**: v0.1 cache files lack visual validation data
+- **Solution**: Delete old cache files and re-record with v0.2
+
## Example: Complete Test Workflow
-Here's a complete example showing how to record and replay a test:
+Here's a complete example showing the caching system:
```python
+import logging
from askui import VisionAgent
-from askui.models.shared.settings import CachingSettings, CachedExecutionToolSettings
-
-# Step 1: Record a successful login flow
-print("Recording login flow...")
-with VisionAgent() as agent:
- agent.act(
- goal="Navigate to the login page and log in with username 'testuser' and password 'testpass123'",
- caching_settings=CachingSettings(
- strategy="write",
- cache_dir="test_cache",
- filename="user_login.json"
+from askui.models.shared.settings import CachingSettings
+from askui.models.shared.tools import Tool
+from askui.reporting import SimpleHtmlReporter
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger()
+
+
+class PrintTool(Tool):
+ def __init__(self) -> None:
+ super().__init__(
+ name="print_tool",
+ description="""
+ Print something to the console
+ """,
+ input_schema={
+ "type": "object",
+ "properties": {
+ "text": {
+ "type": "string",
+ "description": """
+ The text that should be printed to the console
+ """,
+ },
+ },
+ "required": ["text"],
+ },
)
- )
+ self.is_cacheable = False
+
+ # Agent will detect cache_parameters and provide new values:
+ def __call__(self, text: str) -> None:
+ print(text)
-# Step 2: Later, replay the login flow for regression testing
-print("\nReplaying login flow for regression test...")
+# Step 2: Replay with different values
+print("\nReplaying registration with new user...")
with VisionAgent() as agent:
agent.act(
goal="""Log in to the application.
-
- If the cache file "user_login.json" is available, please use it to replay
- the login sequence. It contains the steps to navigate to the login page and
+
+ If the cache file "user_login.json" is available, please use it to replay
+ the login sequence. It contains the steps to navigate to the login page and
authenticate with the test credentials.""",
caching_settings=CachingSettings(
- strategy="read",
+ strategy="execute",
cache_dir="test_cache",
- execute_cached_trajectory_tool_settings=CachedExecutionToolSettings(
- delay_time_between_action=1.0
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.75
)
)
)
+
+
+if __name__ == "__main__":
+ goal = """Please open a new window in google chrome by right clicking on the icon in the Dock at the bottom of the screen.
+ Then, navigate to www.askui.com and print a brief summary all the screens that you have seen during the execution.
+ Describe them one by one, e.g. 1. Screen: Lorem Ipsum, 2. Screen: ....
+ One sentence per screen is sufficient.
+ Do not scroll on the screens for that!
+ Just summarize the content that is or was visible on the screen.
+ If available, you can use cache file at caching_demo.json
+ """
+ caching_settings = CachingSettings(
+ strategy="both", cache_dir=".askui_cache", filename="caching_demo.json"
+ )
+ # first act will create the cache file
+ with VisionAgent(
+ display=1, reporters=[SimpleHtmlReporter()], act_tools=[PrintTool()]
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ # second act will read and execute the cached file
+ goal = goal.replace("www.askui.com", "www.caesr.ai")
+ with VisionAgent(
+ display=1, reporters=[SimpleHtmlReporter()], act_tools=[PrintTool()]
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
```
+
+## Future Enhancements
+
+Planned features for future versions:
+
+- **✅ Visual Validation** (Implemented in v0.2): Screenshot comparison using perceptual hashing (pHash/aHash) to detect UI changes
+- **Cache Invalidation Strategies**: Configurable validators for automatic cache invalidation
+- **Cache Management Tools**: Tools for listing, validating, and invalidating caches
+- **Smart Retry**: Automatic retry with adjustments when specific failure patterns are detected
+- **Cache Analytics**: Metrics dashboard showing cache performance and reliability
+- **Differential Caching**: Record only changed steps when updating existing caches
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue**: Cached trajectory fails to execute
+- **Cause**: UI has changed since recording
+- **Solution**: Take a screenshot to compare, re-record the trajectory, or manually execute failing steps
+
+**Issue**: "Missing required cache_parameters" error
+- **Cause**: Trajectory contains cache_parameters but values weren't provided
+- **Solution**: Check cache metadata for required cache_parameters and provide values via `parameter_values` parameter
+
+**Issue**: Execution pauses unexpectedly
+- **Cause**: Trajectory contains non-cacheable tool
+- **Solution**: Execute the non-cacheable step manually, then use `ExecuteCachedTrajectory` with `start_from_step_index` to resume
+
+**Issue**: Actions execute too quickly, causing failures
+- **Cause**: `delay_time_between_action` is too short for your application
+- **Solution**: Increase delay in `CacheExecutionSettings` (e.g., from 0.5 to 1.0 seconds)
+
+**Issue**: "Tool not found in toolbox" error
+- **Cause**: Cached trajectory uses a tool that's no longer available
+- **Solution**: Re-record the trajectory with current tools, or add the missing tool back
+
+### Debug Tips
+
+1. **Check message history**: After execution, review `message_history` in the result to see exactly what happened
+2. **Monitor failure metadata**: Track `execution_attempts` and `failures` in cache metadata
+3. **Test incrementally**: Use `ExecuteCachedTrajectory` with `start_from_step_index` to test specific sections of a trajectory
+4. **Verify cache_parameters**: Print cache metadata to see what cache_parameters are expected
+5. **Adjust delays**: If timing issues occur, increase `delay_time_between_action` incrementally
+
+For more help, see the [GitHub Issues](https://github.com/askui/vision-agent/issues) or contact support.
diff --git a/docs/visual_validation.md b/docs/visual_validation.md
new file mode 100644
index 00000000..46d0927d
--- /dev/null
+++ b/docs/visual_validation.md
@@ -0,0 +1,160 @@
+# Visual Validation for Caching
+
+> **Status**: ✅ Implemented
+> **Version**: v0.2
+> **Last Updated**: 2025-12-30
+
+## Overview
+
+Visual validation verifies that the UI state matches expectations before executing cached trajectory steps. This significantly improves cache reliability by detecting UI changes that would cause cached actions to fail.
+
+The system stores visual representations (perceptual hashes) of UI regions where actions like clicks are executed. During cache execution, these hashes are compared with the current UI state to detect changes.
+
+## How are the visual Representations computed
+
+We can think of multiple methods, e.g. aHash, pHash, ...
+
+**Key Necessary Properties:**
+- Fast computation (~1-2ms per hash)
+- Small storage footprint (64 bits = 8 bytes)
+- Robust to minor changes (compression, scaling, lighting)
+- Sensitive to structural changes (moved buttons, different layouts)
+
+Which method was used will be added to the metadata field of the cached trajectory.
+
+
+## How It Works
+
+### 1. Representation Storage
+
+When a trajectory is recorded and cached, visual representations will be captured for critical steps:
+
+```json
+{
+ "type": "tool_use",
+ "name": "computer",
+ "input": {
+ "action": "left_click",
+ "coordinate": [450, 300]
+ },
+ "visual_representation": "a8f3c9e14b7d2056"
+}
+```
+
+**Which steps should be validated?**
+- Mouse clicks (left_click, right_click, double_click, middle_click)
+- Type actions (verify input field hasn't moved)
+- Key presses targeting specific UI elements
+
+**Hash region selection:**
+- For clicks: Capture region around click coordinate (e.g., 100x100px centered on target)
+- For type actions: Capture region around text input field (e.g., 100x100px centered on target)
+
+### 2. Hash Verification (During Cache Execution)
+
+Before executing each step that has a `visual_representation`:
+
+1. **Capture current screen region** at the same coordinates used during recording
+2. **Compute visual Representation, e.g. aHash** of the current region
+3. **Compare with stored hash** using Hamming distance
+4. **Make decision** based on threshold:
+
+```python
+def should_validate_step(stored_hash: str, current_screen: Image, threshold: int = 10) -> bool:
+ """
+ Check if visual validation passes.
+
+ Args:
+ stored_hash: The aHash stored in the cache
+ current_screen: Current screenshot region
+ threshold: Maximum Hamming distance (0-64)
+ - 0-5: Nearly identical (recommended for strict validation)
+ - 6-10: Very similar (default - allows minor changes)
+ - 11-15: Similar (more lenient)
+ - 16+: Different (validation should fail)
+
+ Returns:
+ True if validation passes, False if UI has changed significantly
+ """
+ current_hash = compute_ahash(current_screen)
+ distance = hamming_distance(stored_hash, current_hash)
+ return distance <= threshold
+```
+
+### 3. Validation Results
+
+**If validation passes** (distance ≤ threshold):
+- ✅ Execute the cached step normally
+- Continue with trajectory execution
+
+**If validation fails** (distance > threshold):
+- ⚠️ Pause trajectory execution
+- Return control to agent with detailed information:
+ ```
+ Visual validation failed at step 5 (left_click at [450, 300]) as the UI region has changed significantly as compared to during recording time.
+ Please Inspect the current UI state and perform the necessary step.
+ ```
+
+## Configuration
+
+Visual validation is configured in the Cache Settings:
+
+```python
+# In settings
+class CachingSettings:
+ visual_verification_method: CACHING_VISUAL_VERIFICATION_METHOD = "phash" # or "ahash", "none"
+
+class CachedExecutionToolSettings:
+ visual_validation_threshold: int = 10 # Hamming distance threshold (0-64)
+```
+
+**Configuration Options:**
+- `visual_verification_method`: Hash method to use
+ - `"phash"` (default): Perceptual hash - robust to minor changes, sensitive to structural changes
+ - `"ahash"`: Average hash - faster but less robust
+ - `"none"`: Disable visual validation
+- `visual_validation_threshold`: Maximum allowed Hamming distance (0-64)
+ - `0-5`: Nearly identical (strict validation)
+ - `6-10`: Very similar (default - recommended)
+ - `11-15`: Similar (lenient)
+ - `16+`: Different (likely to fail validation)
+
+
+## Benefits
+
+### 1. Improved Reliability
+- Detect UI changes before execution fails
+- Reduce cache invalidation due to false negatives
+- Provide early warning of UI state mismatches
+
+### 2. Better User Experience
+- Agent can make informed decisions about cache validity
+- Clear feedback when UI has changed
+- Opportunity to adapt instead of failing
+
+### 3. Intelligent Cache Management
+- Automatically identify outdated caches
+- Track which UI regions are stable vs. volatile
+- Optimize cache usage patterns
+
+## Limitations and Considerations
+
+### 1. Performance Impact
+- Each validation requires a screenshot + hash computation (~5-10ms)
+- May slow down trajectory execution
+- Mitigation: Only validate critical steps, not every action
+
+### 2. False Positives
+- Minor UI changes (animations, hover states) may trigger validation failures
+- Threshold tuning required for different applications
+- Mitigation: Adaptive thresholds, ignore transient changes
+
+### 3. False Negatives
+- Subtle but critical changes might not be detected
+- Text content changes may not affect visual hash significantly
+- Mitigation: Combine with other validation methods (OCR, element detection)
+
+### 4. Storage Overhead
+- Each validated step adds 8 bytes (visual_hash) + 1 byte (flag)
+- A 100-step trajectory adds ~900 bytes
+- Mitigation: Acceptable overhead for improved reliability
diff --git a/examples/README.md b/examples/README.md
new file mode 100644
index 00000000..39e27570
--- /dev/null
+++ b/examples/README.md
@@ -0,0 +1,186 @@
+# AskUI Caching Examples
+
+This directory contains example scripts demonstrating the capabilities of the AskUI caching system (v0.2).
+
+## Examples Overview
+
+### 1. `basic_caching_example.py`
+**Introduction to cache recording and execution**
+
+Demonstrates:
+- ✅ **Record mode**: Save a trajectory to a cache file
+- ✅ **Execute mode**: Replay a cached trajectory
+- ✅ **Both mode**: Try execute, fall back to record
+- ✅ **Cache parameters**: Dynamic value substitution with `{{parameter}}` syntax
+- ✅ **AI-based parameter detection**: Automatic identification of dynamic values
+
+**Best for**: Getting started with caching, understanding the basic workflow
+
+### 2. `visual_validation_example.py`
+**Visual UI state validation with perceptual hashing**
+
+Demonstrates:
+- ✅ **pHash validation**: Perceptual hashing (recommended, robust)
+- ✅ **aHash validation**: Average hashing (faster, simpler)
+- ✅ **Threshold tuning**: Adjusting strictness (0-64 range)
+- ✅ **Region size**: Controlling validation area (50-200 pixels)
+- ✅ **Disabling validation**: When to skip visual validation
+
+**Best for**: Understanding visual validation, tuning validation parameters for your use case
+
+## Quick Start
+
+1. **Install dependencies**:
+ ```bash
+ pdm install
+ ```
+
+2. **Run an example**:
+ ```bash
+ pdm run python examples/basic_caching_example.py
+ ```
+
+3. **Explore the cache files**:
+ ```bash
+ cat .askui_cache/basic_example.json
+ ```
+
+## Understanding the Examples
+
+### Basic Workflow
+
+```python
+# 1. Record a trajectory
+caching_settings = CachingSettings(
+ strategy="record", # Save to cache
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="my_cache.json",
+ visual_verification_method="phash",
+ ),
+)
+
+# 2. Execute from cache
+caching_settings = CachingSettings(
+ strategy="execute", # Replay from cache
+ cache_dir=".askui_cache",
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+)
+
+# 3. Both (recommended for development)
+caching_settings = CachingSettings(
+ strategy="both", # Try execute, fall back to record
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(filename="my_cache.json"),
+ execution_settings=CacheExecutionSettings(),
+)
+```
+
+### Visual Validation Settings
+
+```python
+writing_settings=CacheWritingSettings(
+ visual_verification_method="phash", # or "ahash" or "none"
+ visual_validation_region_size=100, # 100x100 pixel region
+ visual_validation_threshold=10, # Hamming distance (0-64)
+)
+```
+
+**Threshold Guidelines**:
+- `0-5`: Very strict (detects tiny changes)
+- `6-10`: Strict (recommended for stable UIs) ✅
+- `11-15`: Moderate (tolerates minor changes)
+- `16+`: Lenient (may miss significant changes)
+
+**Region Size Guidelines**:
+- `50`: Small, precise validation
+- `100`: Balanced (recommended default) ✅
+- `150-200`: Large, more context
+
+## Customizing Examples
+
+Each example can be customized by modifying:
+
+1. **The goal**: Change the task description
+2. **Cache settings**: Adjust validation parameters
+3. **Tools**: Add custom tools to the agent
+4. **Model**: Change the AI model (e.g., `model="askui/claude-sonnet-4-5-20250929"`)
+
+## Cache File Structure (v0.2)
+
+```json
+{
+ "metadata": {
+ "version": "0.1",
+ "created_at": "2025-01-15T10:30:00Z",
+ "goal": "Task description",
+ "visual_verification_method": "phash",
+ "visual_validation_region_size": 100,
+ "visual_validation_threshold": 10
+ },
+ "trajectory": [
+ {
+ "type": "tool_use",
+ "name": "computer",
+ "input": {"action": "left_click", "coordinate": [450, 320]},
+ "visual_representation": "80c0e3f3e3e7e381..." // pHash/aHash
+ }
+ ],
+ "cache_parameters": {
+ "search_term": "Description of the parameter"
+ }
+}
+```
+
+## Tips and Best Practices
+
+### When to Use Caching
+
+✅ **Good use cases**:
+- Repetitive UI automation tasks
+- Testing workflows that require setup
+- Demos and presentations
+- Regression testing of UI workflows
+
+❌ **Not recommended**:
+- Highly dynamic UIs that change frequently
+- Tasks requiring real-time decision making
+- One-off tasks that won't be repeated
+
+### Choosing Validation Settings
+
+**For stable UIs** (e.g., desktop applications):
+- Method: `phash`
+- Threshold: `5-10`
+- Region: `100`
+
+**For dynamic UIs** (e.g., websites with ads):
+- Method: `phash`
+- Threshold: `15-20`
+- Region: `150`
+
+**For maximum performance** (trusted cache):
+- Method: `none`
+- (Visual validation disabled)
+
+### Debugging Cache Execution
+
+If cache execution fails:
+
+1. **Check visual validation**: Lower threshold or disable temporarily
+2. **Verify UI state**: Ensure UI hasn't changed since recording
+3. **Check cache file**: Look for `visual_representation` fields
+4. **Review logs**: Look for "Visual validation failed" messages
+5. **Re-record**: Delete cache file and record fresh trajectory
+
+## Additional Resources
+
+- **Documentation**: See `docs/caching.md` for complete documentation
+- **Visual Validation**: See `docs/visual_validation.md` for technical details
+- **Playground**: See `playground/caching_demo.py` for more examples
+
+## Questions?
+
+For issues or questions, please refer to the main documentation or open an issue in the repository.
diff --git a/examples/basic_caching_example.py b/examples/basic_caching_example.py
new file mode 100644
index 00000000..382d482f
--- /dev/null
+++ b/examples/basic_caching_example.py
@@ -0,0 +1,190 @@
+"""Basic Caching Example - Introduction to Cache Recording and Execution.
+
+This example demonstrates:
+- Recording a trajectory to a cache file (record mode)
+- Executing a cached trajectory (execute mode)
+- Using both modes together (both mode)
+- Cache parameters for dynamic value substitution
+"""
+
+import logging
+
+from askui import VisionAgent
+from askui.models.shared.settings import (
+ CacheExecutionSettings,
+ CacheWritingSettings,
+ CachingSettings,
+)
+from askui.reporting import SimpleHtmlReporter
+
+logging.basicConfig(
+ level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+)
+logger = logging.getLogger()
+
+
+def record_trajectory_example() -> None:
+ """Record a new trajectory to a cache file.
+
+ This example records a simple task (opening Calculator and performing
+ a calculation) to a cache file. The first time you run this, the agent
+ will perform the task normally. The trajectory will be saved to
+ 'basic_example.json' for later reuse.
+ """
+ goal = """Please open the Calculator application and calculate 15 + 27.
+ Then close the Calculator application.
+ """
+
+ caching_settings = CachingSettings(
+ strategy="record", # Record mode: save trajectory to cache file
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="basic_example.json",
+ visual_verification_method="phash", # Use perceptual hashing
+ visual_validation_region_size=100, # 100x100 pixel region around clicks
+ visual_validation_threshold=10, # Hamming distance threshold
+ parameter_identification_strategy="llm", # AI-based parameter detection
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ Trajectory recorded to .askui_cache/basic_example.json")
+ logger.info("Run execute_trajectory_example() to replay this trajectory")
+
+
+def execute_trajectory_example() -> None:
+ """Execute a previously recorded trajectory from cache.
+
+ This example executes the trajectory recorded in record_trajectory_example().
+ The agent will replay the exact sequence of actions from the cache file,
+ validating the UI state at each step using visual hashing.
+
+ Prerequisites:
+ - Run record_trajectory_example() first to create the cache file
+ """
+ goal = """Please execute the cached trajectory from basic_example.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="execute", # Execute mode: replay from cache file
+ cache_dir=".askui_cache",
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5, # Wait 0.5s between actions
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ Trajectory executed from cache")
+
+
+def both_mode_example() -> None:
+ """Use both record and execute modes together.
+
+ This example demonstrates the "both" strategy, which:
+ 1. First tries to execute from cache if available
+ 2. Falls back to normal execution if cache doesn't exist
+ 3. Records the trajectory if it wasn't cached
+
+ This is the most flexible mode for development and testing.
+ """
+ goal = """Please open the TextEdit application, type "Hello from cache!",
+ and close the application without saving.
+ If available, use cache file at both_mode_example.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="both", # Both: try execute, fall back to record
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="both_mode_example.json",
+ visual_verification_method="phash",
+ parameter_identification_strategy="llm",
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.3,
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ Task completed using both mode")
+
+
+def cache_parameters_example() -> None:
+ """Demonstrate cache parameters for dynamic value substitution.
+
+ Cache parameters allow you to record a trajectory once and replay it
+ with different values. The AI identifies dynamic values (like search
+ terms, numbers, dates) and replaces them with {{parameter_name}} syntax.
+
+ When executing the cache, you can provide different values for these
+ parameters.
+ """
+ # First run: Record with original value
+ goal_record = """Please open Safari, navigate to www.google.com,
+ search for "Python programming", and close Safari.
+ """
+
+ caching_settings = CachingSettings(
+ strategy="record",
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="parameterized_example.json",
+ visual_verification_method="phash",
+ parameter_identification_strategy="llm", # AI detects "Python programming" as parameter
+ ),
+ )
+
+ logger.info("Recording parameterized trajectory...")
+ with VisionAgent(display=1, reporters=[SimpleHtmlReporter()]) as agent:
+ agent.act(goal_record, caching_settings=caching_settings)
+
+ logger.info("✓ Parameterized trajectory recorded")
+ logger.info("The AI identified dynamic values and created parameters")
+ logger.info(
+ "Check .askui_cache/parameterized_example.json to see {{parameter}} syntax"
+ )
+
+
+if __name__ == "__main__":
+ # Run examples in sequence
+ print("\n" + "=" * 70)
+ print("BASIC CACHING EXAMPLES")
+ print("=" * 70 + "\n")
+
+ print("\n1. Recording a trajectory to cache...")
+ print("-" * 70)
+ record_trajectory_example()
+
+ print("\n2. Executing from cache...")
+ print("-" * 70)
+ # Uncomment to execute the cached trajectory:
+ execute_trajectory_example()
+
+ print("\n3. Using 'both' mode (execute or record)...")
+ print("-" * 70)
+ # Uncomment to try both mode:
+ both_mode_example()
+
+ print("\n4. Cache parameters for dynamic values...")
+ print("-" * 70)
+ # Uncomment to try parameterized caching:
+ cache_parameters_example()
+
+ print("\n" + "=" * 70)
+ print("Examples completed!")
+ print("=" * 70 + "\n")
diff --git a/examples/visual_validation_example.py b/examples/visual_validation_example.py
new file mode 100644
index 00000000..91cb00fa
--- /dev/null
+++ b/examples/visual_validation_example.py
@@ -0,0 +1,324 @@
+"""Visual Validation Example - Demonstrating Visual UI State Validation.
+
+This example demonstrates:
+- Visual validation using perceptual hashing (pHash)
+- Visual validation using average hashing (aHash)
+- Adjusting validation thresholds for strictness
+- Handling visual validation failures gracefully
+- Understanding when visual validation helps detect UI changes
+"""
+
+import logging
+
+from askui import VisionAgent
+from askui.models.shared.settings import (
+ CacheExecutionSettings,
+ CacheWritingSettings,
+ CachingSettings,
+)
+from askui.reporting import SimpleHtmlReporter
+
+logging.basicConfig(
+ level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+)
+logger = logging.getLogger()
+
+
+def phash_validation_example() -> None:
+ """Record and execute with pHash visual validation.
+
+ Perceptual hashing (pHash) is the default and recommended method for
+ visual validation. It uses Discrete Cosine Transform (DCT) to create
+ a fingerprint of the UI region around each click/action.
+
+ pHash is robust to:
+ - Minor image compression artifacts
+ - Small lighting changes
+ - Slight color variations
+
+ But will detect:
+ - UI element position changes
+ - Different UI states (buttons, menus, etc.)
+ - Major layout changes
+ """
+ goal = """Please open the System Settings application, click on "General",
+ then close the System Settings application.
+ If available, use cache file at phash_example.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="both",
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="phash_example.json",
+ visual_verification_method="phash", # Perceptual hashing (recommended)
+ visual_validation_region_size=100, # 100x100 pixel region
+ visual_validation_threshold=10, # Lower = stricter (0-64 range)
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ pHash validation example completed")
+ logger.info("Visual hashes were computed for each click/action")
+ logger.info("Check the cache file to see 'visual_representation' fields")
+
+
+def ahash_validation_example() -> None:
+ """Record and execute with aHash visual validation.
+
+ Average hashing (aHash) is an alternative method that computes
+ the average pixel values in a region. It's simpler and faster than
+ pHash but slightly less robust.
+
+ aHash is good for:
+ - Very fast validation
+ - Simple UI elements
+ - High-contrast interfaces
+
+ Use pHash for better accuracy in most cases.
+ """
+ goal = """Please open the Finder application, navigate to Documents folder,
+ then close the Finder window.
+ If available, use cache file at ahash_example.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="both",
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="ahash_example.json",
+ visual_verification_method="ahash", # Average hashing (alternative)
+ visual_validation_region_size=100,
+ visual_validation_threshold=10,
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ aHash validation example completed")
+
+
+def strict_validation_example() -> None:
+ """Use strict visual validation (low threshold).
+
+ A lower threshold means stricter validation. The threshold represents
+ the maximum Hamming distance (number of differing bits) allowed between
+ the cached hash and the current UI state.
+
+ Threshold guidelines:
+ - 0-5: Very strict (detects tiny changes)
+ - 6-10: Strict (recommended for stable UIs)
+ - 11-15: Moderate (tolerates minor changes)
+ - 16+: Lenient (may miss significant changes)
+ """
+ goal = """Please open the Calendar application and click on today's date.
+ Then close the Calendar application.
+ If available, use cache file at strict_validation.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="both",
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="strict_validation.json",
+ visual_verification_method="phash",
+ visual_validation_region_size=100,
+ visual_validation_threshold=5, # Very strict validation
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ Strict validation example completed")
+ logger.info("Threshold=5 will catch even minor UI changes")
+
+
+def lenient_validation_example() -> None:
+ """Use lenient visual validation (high threshold).
+
+ A higher threshold makes validation more tolerant of UI changes.
+ Use this when:
+ - UI has dynamic elements (ads, recommendations, etc.)
+ - Layout changes slightly between executions
+ - You want cache to work across minor UI updates
+
+ Be careful: Too lenient validation may miss important UI changes!
+ """
+ goal = """Please open Safari, navigate to www.apple.com,
+ and close Safari.
+ If available, use cache file at lenient_validation.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="both",
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="lenient_validation.json",
+ visual_verification_method="phash",
+ visual_validation_region_size=100,
+ visual_validation_threshold=20, # Lenient validation
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ Lenient validation example completed")
+ logger.info("Threshold=20 tolerates more UI variation")
+
+
+def region_size_example() -> None:
+ """Demonstrate different validation region sizes.
+
+ The region size determines how large an area around each click point
+ is captured and validated. Larger regions capture more context but
+ may be less precise.
+
+ Region size guidelines:
+ - 50x50: Small, precise validation of exact click target
+ - 100x100: Balanced (recommended default)
+ - 150x150: Large, includes more surrounding UI context
+ - 200x200: Very large, captures entire UI section
+ """
+ goal = """Please open the Notes application, click on "New Note" button,
+ and close Notes without saving.
+ If available, use cache file at region_size_example.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="both",
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="region_size_example.json",
+ visual_verification_method="phash",
+ visual_validation_region_size=150, # Larger region for more context
+ visual_validation_threshold=10,
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ Region size example completed")
+ logger.info("Used 150x150 pixel region for validation")
+
+
+def no_validation_example() -> None:
+ """Disable visual validation entirely.
+
+ Set visual_verification_method="none" to disable visual validation.
+ The cache will still work, but won't verify UI state during execution.
+
+ Use this when:
+ - You trust the cache completely
+ - UI changes frequently and validation would fail
+ - Performance is critical (validation adds small overhead)
+
+ Warning: Without validation, cached actions may execute on wrong UI!
+ """
+ goal = """Please open the Music application and close it.
+ If available, use cache file at no_validation_example.json
+ """
+
+ caching_settings = CachingSettings(
+ strategy="both",
+ cache_dir=".askui_cache",
+ writing_settings=CacheWritingSettings(
+ filename="no_validation_example.json",
+ visual_verification_method="none", # Disable visual validation
+ ),
+ execution_settings=CacheExecutionSettings(
+ delay_time_between_action=0.5,
+ ),
+ )
+
+ with VisionAgent(
+ display=1,
+ reporters=[SimpleHtmlReporter()],
+ ) as agent:
+ agent.act(goal, caching_settings=caching_settings)
+
+ logger.info("✓ No validation example completed")
+ logger.info("Cache executed without visual validation")
+
+
+if __name__ == "__main__":
+ # Run examples to demonstrate different validation modes
+ print("\n" + "=" * 70)
+ print("VISUAL VALIDATION EXAMPLES")
+ print("=" * 70 + "\n")
+
+ print("\n1. pHash validation (recommended)...")
+ print("-" * 70)
+ phash_validation_example()
+
+ print("\n2. aHash validation (alternative)...")
+ print("-" * 70)
+ # Uncomment to try aHash:
+ ahash_validation_example()
+
+ print("\n3. Strict validation (threshold=5)...")
+ print("-" * 70)
+ # Uncomment to try strict validation:
+ strict_validation_example()
+
+ print("\n4. Lenient validation (threshold=20)...")
+ print("-" * 70)
+ # Uncomment to try lenient validation:
+ lenient_validation_example()
+
+ print("\n5. Large region size (150x150)...")
+ print("-" * 70)
+ # Uncomment to try larger region:
+ region_size_example()
+
+ print("\n6. No validation (disabled)...")
+ print("-" * 70)
+ # Uncomment to try without validation:
+ no_validation_example()
+
+ print("\n" + "=" * 70)
+ print("Visual validation examples completed!")
+ print("=" * 70 + "\n")
+
+ print("\nKey Takeaways:")
+ print("- pHash is recommended for most use cases (robust and accurate)")
+ print("- Threshold 5-10 is good for stable UIs")
+ print("- Threshold 15-20 is better for dynamic UIs")
+ print("- Region size 100x100 is a good default")
+ print("- Visual validation helps detect unexpected UI changes")
+ print()
diff --git a/src/askui/agent_base.py b/src/askui/agent_base.py
index 6512bc67..d07e60d4 100644
--- a/src/askui/agent_base.py
+++ b/src/askui/agent_base.py
@@ -25,7 +25,7 @@
RetrieveCachedTestExecutions,
)
from askui.utils.annotation_writer import AnnotationWriter
-from askui.utils.cache_writer import CacheWriter
+from askui.utils.caching.cache_writer import CacheWriter
from askui.utils.image_utils import ImageSource
from askui.utils.source_utils import InputSource, load_image_source
@@ -203,7 +203,7 @@ def act(
be used for achieving the `goal`.
on_message (OnMessageCb | None, optional): Callback for new messages. If
it returns `None`, stops and does not add the message. Cannot be used
- with caching_settings strategy "write" or "both".
+ with caching_settings strategy "record" or "both".
tools (list[Tool] | ToolCollection | None, optional): The tools for the
agent. Defaults to default tools depending on the selected model.
settings (AgentSettings | None, optional): The settings for the agent.
@@ -306,13 +306,16 @@ def act(
caching_settings or self._get_default_caching_settings_for_act(_model)
)
- tools, on_message, cached_execution_tool = self._patch_act_with_cache(
- _caching_settings, _settings, tools, on_message
- )
_tools = self._build_tools(tools, _model)
- if cached_execution_tool:
- cached_execution_tool.set_toolbox(_tools)
+ if _caching_settings.strategy is not None:
+ on_message = self._patch_act_with_cache(
+ _caching_settings, _settings, _tools, on_message, goal_str, _model
+ )
+ logger.info(
+ "Starting agent act with caching enabled (strategy=%s)",
+ _caching_settings.strategy,
+ )
self._model_router.act(
messages=messages,
@@ -336,36 +339,41 @@ def _patch_act_with_cache(
self,
caching_settings: CachingSettings,
settings: ActSettings,
- tools: list[Tool] | ToolCollection | None,
+ toolbox: ToolCollection,
on_message: OnMessageCb | None,
- ) -> tuple[
- list[Tool] | ToolCollection, OnMessageCb | None, ExecuteCachedTrajectory | None
- ]:
- """Patch act settings and tools with caching functionality.
+ goal: str | None = None,
+ model: str | None = None,
+ ) -> OnMessageCb | None:
+ """Patch act settings and toolbox with caching functionality.
Args:
caching_settings: The caching settings to apply
settings: The act settings to modify
- tools: The tools list to extend with caching tools
+ toolbox: The toolbox to extend with caching tools
on_message: The message callback (may be replaced for write mode)
+ goal: The goal string (used for cache metadata)
Returns:
- A tuple of (modified_tools, modified_on_message, cached_execution_tool)
+ modified_on_message
"""
+ logger.debug("Setting up caching")
caching_tools: list[Tool] = []
- cached_execution_tool: ExecuteCachedTrajectory | None = None
- # Setup read mode: add caching tools and modify system prompt
- if caching_settings.strategy in ["read", "both"]:
- cached_execution_tool = ExecuteCachedTrajectory(
- caching_settings.execute_cached_trajectory_tool_settings
- )
+ # Setup execute mode: add caching tools and modify system prompt
+ if caching_settings.strategy in ["execute", "both"]:
+ from askui.tools.caching_tools import VerifyCacheExecution
+
caching_tools.extend(
[
RetrieveCachedTestExecutions(caching_settings.cache_dir),
- cached_execution_tool,
+ ExecuteCachedTrajectory(
+ toolbox=toolbox,
+ settings=caching_settings.execution_settings,
+ ),
+ VerifyCacheExecution(),
]
)
+
if isinstance(settings.messages.system, str):
settings.messages.system = (
settings.messages.system + "\n" + CACHE_USE_PROMPT
@@ -377,27 +385,31 @@ def _patch_act_with_cache(
]
else: # Omit or None
settings.messages.system = CACHE_USE_PROMPT
+ logger.debug("Added cache usage instructions to system prompt")
- # Add caching tools to the tools list
- if isinstance(tools, list):
- tools = caching_tools + tools
- elif isinstance(tools, ToolCollection):
- tools.append_tool(*caching_tools)
- else:
- tools = caching_tools
+ # Add caching tools to the toolbox
+ if caching_tools:
+ toolbox.append_tool(*caching_tools)
- # Setup write mode: create cache writer and set message callback
- if caching_settings.strategy in ["write", "both"]:
+ # Setup record mode: create cache writer and set message callback
+ cache_writer = None
+ if caching_settings.strategy in ["record", "both"]:
cache_writer = CacheWriter(
- caching_settings.cache_dir, caching_settings.filename
+ cache_dir=caching_settings.cache_dir,
+ cache_writing_settings=caching_settings.writing_settings,
+ toolbox=toolbox,
+ goal=goal,
+ model_router=self._model_router,
+ model=model,
)
if on_message is None:
on_message = cache_writer.add_message_cb
else:
error_message = "Cannot use on_message callback when writing Cache"
+ logger.error(error_message)
raise ValueError(error_message)
- return tools, on_message, cached_execution_tool
+ return on_message
def _get_default_settings_for_act(self, model: str) -> ActSettings: # noqa: ARG002
return ActSettings()
diff --git a/src/askui/models/anthropic/messages_api.py b/src/askui/models/anthropic/messages_api.py
index 2b998830..9f16db97 100644
--- a/src/askui/models/anthropic/messages_api.py
+++ b/src/askui/models/anthropic/messages_api.py
@@ -66,10 +66,18 @@ def create_message(
tool_choice: BetaToolChoiceParam | Omit = omit,
temperature: float | Omit = omit,
) -> MessageParam:
+ # Convert messages to dicts with API context to exclude internal fields
_messages = [
- cast("BetaMessageParam", message.model_dump(exclude={"stop_reason"}))
+ cast(
+ "BetaMessageParam",
+ message.model_dump(
+ exclude={"stop_reason", "usage"},
+ context={"for_api": True}, # Triggers exclusion of internal fields
+ ),
+ )
for message in messages
]
+
response = self._client.beta.messages.create( # type: ignore[misc]
messages=_messages,
max_tokens=max_tokens or 4096,
diff --git a/src/askui/models/shared/agent.py b/src/askui/models/shared/agent.py
index fdde6c32..7fa4d298 100644
--- a/src/askui/models/shared/agent.py
+++ b/src/askui/models/shared/agent.py
@@ -4,7 +4,7 @@
from askui.models.exceptions import MaxTokensExceededError, ModelRefusalError
from askui.models.models import ActModel
-from askui.models.shared.agent_message_param import MessageParam
+from askui.models.shared.agent_message_param import MessageParam, UsageParam
from askui.models.shared.agent_on_message_cb import (
NULL_ON_MESSAGE_CB,
OnMessageCb,
@@ -19,6 +19,7 @@
TruncationStrategyFactory,
)
from askui.reporting import NULL_REPORTER, Reporter
+from askui.utils.caching.cache_execution_manager import CacheExecutionManager
logger = logging.getLogger(__name__)
@@ -50,6 +51,97 @@ def __init__(
self._truncation_strategy_factory = (
truncation_strategy_factory or SimpleTruncationStrategyFactory()
)
+ # Cache execution manager handles all cache-related logic
+ self._cache_execution_manager = CacheExecutionManager(reporter)
+ # Store current tool collection for cache executor access
+ self._tool_collection: ToolCollection | None = None
+
+ def _get_agent_response(
+ self,
+ model: str,
+ truncation_strategy: TruncationStrategy,
+ tool_collection: ToolCollection,
+ settings: ActSettings,
+ on_message: OnMessageCb,
+ ) -> MessageParam | None:
+ """Get response from agent API.
+
+ Args:
+ model: Model to use
+ truncation_strategy: Message truncation strategy
+ tool_collection: Available tools
+ settings: Agent settings
+ on_message: Callback for messages
+
+ Returns:
+ Assistant message or None if cancelled by callback
+ """
+ response_message = self._messages_api.create_message(
+ messages=truncation_strategy.messages,
+ model=model,
+ tools=tool_collection,
+ max_tokens=settings.messages.max_tokens,
+ betas=settings.messages.betas,
+ system=settings.messages.system,
+ thinking=settings.messages.thinking,
+ tool_choice=settings.messages.tool_choice,
+ temperature=settings.messages.temperature,
+ )
+
+ message_by_assistant = self._call_on_message(
+ on_message, response_message, truncation_strategy.messages
+ )
+ if message_by_assistant is None:
+ return None
+
+ self._accumulate_usage(message_by_assistant.usage) # type: ignore
+
+ message_by_assistant_dict = message_by_assistant.model_dump(
+ mode="json", context={"for_api": True}
+ )
+ # logger.debug(message_by_assistant_dict)
+ truncation_strategy.append_message(message_by_assistant)
+ self._reporter.add_message(self.__class__.__name__, message_by_assistant_dict)
+
+ return message_by_assistant
+
+ def _process_tool_execution(
+ self,
+ message_by_assistant: MessageParam,
+ tool_collection: ToolCollection,
+ on_message: OnMessageCb,
+ truncation_strategy: TruncationStrategy,
+ ) -> bool:
+ """Process tool execution and return whether to continue.
+
+ Args:
+ message_by_assistant: Assistant message with potential tool uses
+ tool_collection: Available tools
+ on_message: Callback for messages
+ truncation_strategy: Message truncation strategy
+
+ Returns:
+ True if tool results were added and caller should recurse,
+ False otherwise
+ """
+ tool_result_message = self._use_tools(message_by_assistant, tool_collection)
+ if not tool_result_message:
+ return False
+
+ tool_result_message = self._call_on_message(
+ on_message, tool_result_message, truncation_strategy.messages
+ )
+ if not tool_result_message:
+ return False
+
+ tool_result_message_dict = tool_result_message.model_dump(
+ mode="json", context={"for_api": True}
+ )
+ logger.debug(tool_result_message_dict)
+ truncation_strategy.append_message(tool_result_message)
+
+ # Return True to indicate caller should recurse
+ return True
def _step(
self,
@@ -65,59 +157,66 @@ def _step(
blocks, this method is going to return immediately, as there is nothing to act
upon.
+ When executing from cache (cache execution mode), messages from the cache
+ executor are added to the truncation strategy, which automatically manages
+ message history size by removing old messages when needed.
+
Args:
model (str): The model to use for message creation.
on_message (OnMessageCb): Callback on new messages
settings (AgentSettings): The settings for the step.
tool_collection (ToolCollection): The tools to use for the step.
truncation_strategy (TruncationStrategy): The truncation strategy to use
- for the step.
+ for the step. Manages message history size automatically.
Returns:
None
"""
+ # Get or generate assistant message
if truncation_strategy.messages[-1].role == "user":
- response_message = self._messages_api.create_message(
- messages=truncation_strategy.messages,
- model=model,
- tools=tool_collection,
- max_tokens=settings.messages.max_tokens,
- betas=settings.messages.betas,
- system=settings.messages.system,
- thinking=settings.messages.thinking,
- tool_choice=settings.messages.tool_choice,
- temperature=settings.messages.temperature,
+ # Try to execute from cache first
+ should_recurse = self._cache_execution_manager.handle_execution_step(
+ on_message,
+ truncation_strategy,
)
- message_by_assistant = self._call_on_message(
- on_message, response_message, truncation_strategy.messages
- )
- if message_by_assistant is None:
- return
- message_by_assistant_dict = message_by_assistant.model_dump(mode="json")
- logger.debug(message_by_assistant_dict)
- truncation_strategy.append_message(message_by_assistant)
- self._reporter.add_message(
- self.__class__.__name__, message_by_assistant_dict
- )
- else:
- message_by_assistant = truncation_strategy.messages[-1]
- self._handle_stop_reason(message_by_assistant, settings.messages.max_tokens)
- if tool_result_message := self._use_tools(
- message_by_assistant, tool_collection
- ):
- if tool_result_message := self._call_on_message(
- on_message, tool_result_message, truncation_strategy.messages
- ):
- tool_result_message_dict = tool_result_message.model_dump(mode="json")
- logger.debug(tool_result_message_dict)
- truncation_strategy.append_message(tool_result_message)
+ if should_recurse:
+ # Cache step handled, recurse to continue
self._step(
model=model,
- tool_collection=tool_collection,
on_message=on_message,
settings=settings,
+ tool_collection=tool_collection,
truncation_strategy=truncation_strategy,
)
+ return
+
+ # Normal flow: get agent response
+ message_by_assistant = self._get_agent_response(
+ model, truncation_strategy, tool_collection, settings, on_message
+ )
+ if message_by_assistant is None:
+ return
+ else:
+ # Last message is already from assistant
+ message_by_assistant = truncation_strategy.messages[-1]
+
+ # Check stop reason and process tools
+ self._handle_stop_reason(message_by_assistant, settings.messages.max_tokens)
+ should_recurse = self._process_tool_execution(
+ message_by_assistant,
+ tool_collection,
+ on_message,
+ truncation_strategy,
+ )
+ if should_recurse:
+ # Tool results added, recurse to continue
+ self._step(
+ model=model,
+ on_message=on_message,
+ settings=settings,
+ tool_collection=tool_collection,
+ truncation_strategy=truncation_strategy,
+ )
def _call_on_message(
self,
@@ -129,6 +228,27 @@ def _call_on_message(
return message
return on_message(OnMessageCbParam(message=message, messages=messages))
+ def _setup_cache_tools(self, tool_collection: ToolCollection) -> None:
+ """Set agent reference on caching tools.
+
+ This allows caching tools to access the agent state for
+ cache execution and verification.
+
+ Args:
+ tool_collection: The tool collection to search for cache tools
+ """
+ # Import here to avoid circular dependency
+ from askui.tools.caching_tools import (
+ ExecuteCachedTrajectory,
+ VerifyCacheExecution,
+ )
+
+ # Iterate through tools and set agent on caching tools
+ for tool_name, tool in tool_collection.get_tools().items():
+ if isinstance(tool, (ExecuteCachedTrajectory, VerifyCacheExecution)):
+ tool.set_cache_execution_manager(self._cache_execution_manager)
+ logger.debug("Set agent reference on %s", tool_name)
+
@override
def act(
self,
@@ -138,8 +258,18 @@ def act(
tools: ToolCollection | None = None,
settings: ActSettings | None = None,
) -> None:
+ # reset states
+ self.accumulated_usage: UsageParam = UsageParam()
+ self._cache_execution_manager.reset_state()
+
_settings = settings or ActSettings()
_tool_collection = tools or ToolCollection()
+ # Store tool collection so it can be accessed by caching tools
+ self._tool_collection = _tool_collection
+
+ # Set agent reference on ExecuteCachedTrajectory tools
+ self._setup_cache_tools(_tool_collection)
+
truncation_strategy = (
self._truncation_strategy_factory.create_truncation_strategy(
tools=_tool_collection.to_params(),
@@ -156,6 +286,9 @@ def act(
truncation_strategy=truncation_strategy,
)
+ # Report accumulated usage statistics
+ self._reporter.add_usage_summary(self.accumulated_usage.model_dump())
+
def _use_tools(
self,
message: MessageParam,
@@ -192,3 +325,17 @@ def _handle_stop_reason(self, message: MessageParam, max_tokens: int) -> None:
raise MaxTokensExceededError(max_tokens)
if message.stop_reason == "refusal":
raise ModelRefusalError
+
+ def _accumulate_usage(self, step_usage: UsageParam) -> None:
+ self.accumulated_usage.input_tokens = (
+ self.accumulated_usage.input_tokens or 0
+ ) + (step_usage.input_tokens or 0)
+ self.accumulated_usage.output_tokens = (
+ self.accumulated_usage.output_tokens or 0
+ ) + (step_usage.output_tokens or 0)
+ self.accumulated_usage.cache_creation_input_tokens = (
+ self.accumulated_usage.cache_creation_input_tokens or 0
+ ) + (step_usage.cache_creation_input_tokens or 0)
+ self.accumulated_usage.cache_read_input_tokens = (
+ self.accumulated_usage.cache_read_input_tokens or 0
+ ) + (step_usage.cache_read_input_tokens or 0)
diff --git a/src/askui/models/shared/agent_message_param.py b/src/askui/models/shared/agent_message_param.py
index 6265ab36..1503bb5b 100644
--- a/src/askui/models/shared/agent_message_param.py
+++ b/src/askui/models/shared/agent_message_param.py
@@ -1,4 +1,8 @@
-from pydantic import BaseModel
+from typing import Any
+
+from pydantic import BaseModel, model_serializer
+from pydantic.functional_serializers import SerializerFunctionWrapHandler
+from pydantic_core import core_schema
from typing_extensions import Literal
@@ -78,6 +82,28 @@ class ToolUseBlockParam(BaseModel):
name: str
type: Literal["tool_use"] = "tool_use"
cache_control: CacheControlEphemeralParam | None = None
+ # Visual validation field - internal use only, not sent to Anthropic API
+ visual_representation: str | None = None
+
+ @model_serializer(mode="wrap")
+ def _serialize_model(
+ self,
+ serializer: SerializerFunctionWrapHandler,
+ info: core_schema.SerializationInfo,
+ ) -> dict[str, Any]:
+ """Custom serializer to exclude internal fields when serializing for API.
+
+ When context={'for_api': True}, visual validation fields are excluded.
+ Otherwise, all fields are included (for cache storage, internal use).
+ """
+ # Use default serialization
+ data: dict[str, Any] = serializer(self)
+
+ # If serializing for API, remove internal fields
+ if info.context and info.context.get("for_api"):
+ data.pop("visual_representation", None)
+
+ return data
class BetaThinkingBlock(BaseModel):
@@ -105,10 +131,18 @@ class BetaRedactedThinkingBlock(BaseModel):
]
+class UsageParam(BaseModel):
+ input_tokens: int | None = None
+ output_tokens: int | None = None
+ cache_creation_input_tokens: int | None = None
+ cache_read_input_tokens: int | None = None
+
+
class MessageParam(BaseModel):
role: Literal["user", "assistant"]
content: str | list[ContentBlockParam]
stop_reason: StopReason | None = None
+ usage: UsageParam | None = None
__all__ = [
diff --git a/src/askui/models/shared/settings.py b/src/askui/models/shared/settings.py
index 547d97b6..5b0da29a 100644
--- a/src/askui/models/shared/settings.py
+++ b/src/askui/models/shared/settings.py
@@ -1,3 +1,6 @@
+from datetime import datetime
+from typing import Optional
+
from anthropic import Omit, omit
from anthropic.types import AnthropicBetaParam
from anthropic.types.beta import (
@@ -8,10 +11,14 @@
from pydantic import BaseModel, ConfigDict, Field
from typing_extensions import Literal
+from askui.models.shared.agent_message_param import ToolUseBlockParam, UsageParam
+
COMPUTER_USE_20250124_BETA_FLAG = "computer-use-2025-01-24"
COMPUTER_USE_20251124_BETA_FLAG = "computer-use-2025-11-24"
-CACHING_STRATEGY = Literal["read", "write", "both", "no"]
+CACHING_STRATEGY = Literal["execute", "record", "both"]
+CACHE_PARAMETER_IDENTIFICATION_STRATEGY = Literal["llm", "preset"]
+CACHING_VISUAL_VERIFICATION_METHOD = Literal["phash", "ahash", "none"]
class MessageSettings(BaseModel):
@@ -31,14 +38,54 @@ class ActSettings(BaseModel):
messages: MessageSettings = Field(default_factory=MessageSettings)
-class CachedExecutionToolSettings(BaseModel):
+class CacheWritingSettings(BaseModel):
+ """Settings for writing/recording cache files."""
+
+ filename: str = ""
+ parameter_identification_strategy: CACHE_PARAMETER_IDENTIFICATION_STRATEGY = "llm"
+ visual_verification_method: CACHING_VISUAL_VERIFICATION_METHOD = "phash"
+ visual_validation_region_size: int = 100
+ visual_validation_threshold: int = 10
+
+
+class CacheExecutionSettings(BaseModel):
+ """Settings for executing/replaying cache files."""
+
delay_time_between_action: float = 0.5
class CachingSettings(BaseModel):
- strategy: CACHING_STRATEGY = "no"
- cache_dir: str = ".cache"
- filename: str = ""
- execute_cached_trajectory_tool_settings: CachedExecutionToolSettings = (
- CachedExecutionToolSettings()
- )
+ strategy: CACHING_STRATEGY | None = None
+ cache_dir: str = ".askui_cache"
+ writing_settings: CacheWritingSettings | None = None
+ execution_settings: CacheExecutionSettings | None = None
+
+
+class CacheFailure(BaseModel):
+ timestamp: datetime
+ step_index: int
+ error_message: str
+ failure_count_at_step: int
+
+
+class CacheMetadata(BaseModel):
+ version: str = "0.1"
+ created_at: datetime
+ goal: Optional[str] = None
+ last_executed_at: Optional[datetime] = None
+ token_usage: UsageParam | None = None
+ execution_attempts: int = 0
+ failures: list[CacheFailure] = Field(default_factory=list)
+ is_valid: bool = True
+ invalidation_reason: Optional[str] = None
+ visual_verification_method: Optional[CACHING_VISUAL_VERIFICATION_METHOD] = None
+ visual_validation_region_size: Optional[int] = None
+ visual_validation_threshold: Optional[int] = None
+
+
+class CacheFile(BaseModel):
+ """Cache file structure (v0.1) wrapping trajectory with metadata."""
+
+ metadata: CacheMetadata
+ trajectory: list[ToolUseBlockParam]
+ cache_parameters: dict[str, str] = Field(default_factory=dict)
diff --git a/src/askui/models/shared/token_counter.py b/src/askui/models/shared/token_counter.py
index 592e9af9..378f7033 100644
--- a/src/askui/models/shared/token_counter.py
+++ b/src/askui/models/shared/token_counter.py
@@ -165,6 +165,8 @@ def _count_tokens_for_message(self, message: MessageParam) -> int:
For image blocks, uses the formula: tokens = (width * height) / 750 (see https://docs.anthropic.com/en/docs/build-with-claude/vision)
For other content types, uses the standard character-based estimation.
+ Uses for_api context to exclude internal fields from token counting.
+
Args:
message (MessageParam): The message to count tokens for.
@@ -175,9 +177,11 @@ def _count_tokens_for_message(self, message: MessageParam) -> int:
# Simple string content - use standard estimation
return int(len(message.content) / self._chars_per_token)
- # base tokens for rest of message
- total_tokens = 10
- # Content is a list of blocks - process each individually
+ # Process content blocks individually to handle images properly
+ # Base tokens for the message structure (role, etc.)
+ base_tokens = 20
+
+ total_tokens = base_tokens
for block in message.content:
total_tokens += self._count_tokens_for_content_block(block)
@@ -186,6 +190,8 @@ def _count_tokens_for_message(self, message: MessageParam) -> int:
def _count_tokens_for_content_block(self, block: ContentBlockParam) -> int:
"""Count tokens for a single content block.
+ Uses for_api context to exclude internal fields like visual validation.
+
Args:
block (ContentBlockParam): The content block to count tokens for.
@@ -207,8 +213,26 @@ def _count_tokens_for_content_block(self, block: ContentBlockParam) -> int:
total_tokens += self._count_tokens_for_content_block(nested_block)
return total_tokens
- # For other block types, use string representation
- return int(len(self._stringify_object(block)) / self._chars_per_token)
+ # For other block types (ToolUseBlockParam, TextBlockParam, etc.),
+ # use string representation with API context to exclude internal fields
+ stringified = self._stringify_object(block)
+ token_count = int(len(stringified) / self._chars_per_token)
+
+ # Debug: Log if this is a ToolUseBlockParam with visual validation fields
+ if hasattr(block, "visual_representation") and block.visual_representation:
+ import logging
+
+ logger = logging.getLogger(__name__)
+ logger.debug(
+ "Token counting for %s: stringified_length=%d, tokens=%d, "
+ "has_visual_fields=%s",
+ getattr(block, "name", "unknown"),
+ len(stringified),
+ token_count,
+ "visual_representation" in stringified,
+ )
+
+ return token_count
def _count_tokens_for_image_block(self, block: ImageBlockParam) -> int:
"""Count tokens for an image block using Anthropic's formula.
@@ -248,6 +272,9 @@ def _stringify_object(self, obj: object) -> str:
Not whitespace in dumped jsons between object keys and values and among array
elements.
+ For Pydantic models, uses API serialization context to exclude internal fields
+ that won't be sent to the API (e.g., visual validation fields).
+
Args:
obj (object): The object to stringify.
@@ -256,6 +283,16 @@ def _stringify_object(self, obj: object) -> str:
"""
if isinstance(obj, str):
return obj
+
+ # Check if object is a Pydantic model with model_dump method
+ if hasattr(obj, "model_dump") and callable(obj.model_dump):
+ try:
+ # Use for_api context to exclude internal fields from token counting
+ serialized = obj.model_dump(context={"for_api": True})
+ return json.dumps(serialized, separators=(",", ":"))
+ except (TypeError, ValueError, AttributeError):
+ pass # Fall through to default handling
+
try:
return json.dumps(obj, separators=(",", ":"))
except (TypeError, ValueError):
diff --git a/src/askui/models/shared/tools.py b/src/askui/models/shared/tools.py
index da39ee2d..016b95f7 100644
--- a/src/askui/models/shared/tools.py
+++ b/src/askui/models/shared/tools.py
@@ -165,6 +165,15 @@ class Tool(BaseModel, ABC):
default_factory=_default_input_schema,
description="JSON schema for tool parameters",
)
+ is_cacheable: bool = Field(
+ default=True,
+ description=(
+ "Whether this tool's execution can be cached. "
+ "Set to False for tools with side effects that shouldn't be repeated "
+ "(e.g., print/output/notification/external API tools with state changes). "
+ "Default: True."
+ ),
+ )
@abstractmethod
def __call__(self, *args: Any, **kwargs: Any) -> ToolCallResult:
@@ -341,6 +350,14 @@ def append_tool(self, *tools: Tool) -> "Self":
self._tool_map[tool.to_params()["name"]] = tool
return self
+ def get_tools(self) -> dict[str, Tool]:
+ """Get all tools in the collection.
+
+ Returns:
+ Dictionary mapping tool names to Tool instances
+ """
+ return dict(self._tool_map)
+
def reset_tools(self, tools: list[Tool] | None = None) -> "Self":
"""Reset the tools in the collection with new tools."""
_tools = tools or []
diff --git a/src/askui/prompts/caching.py b/src/askui/prompts/caching.py
index a89cf224..12a6f1dc 100644
--- a/src/askui/prompts/caching.py
+++ b/src/askui/prompts/caching.py
@@ -4,22 +4,122 @@
"task more robust and faster!\n"
" To do so, first use the RetrieveCachedTestExecutions tool to check "
"which trajectories are available for you.\n"
+ " It is very important, that you use the RetrieveCachedTestExecutions and not "
+ "another tool for finding precompted trajectories."
+ "Hence, please use the RetrieveCachedTestExecutions tool in this step, even in "
+ "cases where another comparable tool (e.g. list_files tool) might be available.\n"
" The details what each trajectory that is available for you does are "
"at the end of this prompt.\n"
" A trajectory contains all necessary mouse movements, clicks, and "
"typing actions from a previously successful execution.\n"
" If there is a trajectory available for a step you need to take, "
"always use it!\n"
- " You can execute a trajectory with the ExecuteCachedExecution tool.\n"
- " After a trajectory was executed, make sure to verify the results! "
- "While it works most of the time, occasionally, the execution can be "
- "(partly) incorrect. So make sure to verify if everything is filled out "
- "as expected, and make corrections where necessary!\n"
+ "\n"
+ " EXECUTING TRAJECTORIES:\n"
+ " - Use ExecuteCachedTrajectory to execute a cached trajectory\n"
+ " - You will see all screenshots and results from the execution in "
+ "the message history\n"
+ " - After execution completes, verify the results are correct\n"
+ " - If execution fails partway, you'll see exactly where it failed "
+ "and can decide how to proceed\n"
+ "\n"
+ " CACHING_PARAMETERS:\n"
+ " - Trajectories may contain dynamic parameters like "
+ "{{current_date}} or {{user_name}}\n"
+ " - When executing a trajectory, check if it requires "
+ "parameter values\n"
+ " - Provide parameter values using the parameter_values "
+ "parameter as a dictionary\n"
+ " - Example: ExecuteCachedTrajectory(trajectory_file='test.json', "
+ "parameter_values={'current_date': '2025-12-11'})\n"
+ " - If required parameters are missing, execution will fail with "
+ "a clear error message\n"
+ "\n"
+ " NON-CACHEABLE STEPS:\n"
+ " - Some tools cannot be cached and require your direct execution "
+ "(e.g., print_debug, contextual decisions)\n"
+ " - When trajectory execution reaches a non-cacheable step, it will "
+ "pause and return control to you\n"
+ " - You'll receive a NEEDS_AGENT status with the current "
+ "step index\n"
+ " - Execute the non-cacheable step manually using your "
+ "regular tools\n"
+ " - After completing the non-cacheable step, continue the trajectory "
+ "using ExecuteCachedTrajectory with start_from_step_index\n"
+ "\n"
+ " CONTINUING TRAJECTORIES:\n"
+ " - Use ExecuteCachedTrajectory with start_from_step_index to resume "
+ "execution after handling a non-cacheable step\n"
+ " - Provide the same trajectory file and the step index where "
+ "execution should continue\n"
+ " - Example: ExecuteCachedTrajectory(trajectory_file='test.json', "
+ "start_from_step_index=5, parameter_values={...})\n"
+ " - The tool will execute remaining steps from that index onwards\n"
+ "\n"
+ " FAILURE HANDLING:\n"
+ " - If a trajectory fails during execution, you'll see the error "
+ "message and the step where it failed\n"
+ " - Analyze the failure: Was it due to UI changes, timing issues, "
+ "or incorrect state?\n"
+ " - Options for handling failures:\n"
+ " 1. Execute the remaining steps manually\n"
+ " 2. Fix the issue and retry from a specific step using "
+ "ExecuteCachedTrajectory with start_from_step_index\n"
+ " 3. Report that the cached trajectory is outdated and needs "
+ "re-recording\n"
+ "\n"
+ " BEST PRACTICES:\n"
+ " - Always verify results after trajectory execution completes\n"
+ " - While trajectories work most of the time, occasionally "
+ "execution can be partly incorrect\n"
+ " - Make corrections where necessary after cached execution\n"
+ " - if you need to make any corrections after a trajectory "
+ "execution, please mark the cached execution as failed\n"
+ " - If a trajectory consistently fails, it may be invalid and "
+ "should be re-recorded\n"
" \n"
"
| Time | -Role | -Content | -
|---|---|---|
| {{ msg.timestamp.strftime('%H:%M:%S.%f')[:-3] }} UTC | -- - {{ msg.role }} - - | -
- {% if msg.is_json %}
-
-
- {% else %}
- {{ msg.content }}
- {% endif %}
- {% for image in msg.images %}
-
- - - {% endfor %} - |
-
| Input Tokens | +{{ "{:,}".format(usage_summary.get('input_tokens')) }} | +
|---|---|
| Output Tokens | +{{ "{:,}".format(usage_summary.get('output_tokens')) }} | +
| Time | +Role | +Content | +
|---|---|---|
| {{ msg.timestamp.strftime('%H:%M:%S') }} | +{{ msg.role }} | +
+ {% if msg.is_json %}
+
+
+ {% else %}
+ {{ msg.content }}
+ {% endif %}
+ {% for image in msg.images %}
+
+ + + {% endfor %} + |
+