askui · philipph-askui · Dec 12, 2025 · Dec 12, 2025 · Dec 12, 2025 · Dec 15, 2025
diff --git a/.gitignore b/.gitignore
@@ -166,4 +166,6 @@ reports/
 /chat
 /askui_chat.db
 .cache/
+.askui_cache/
 
+playground/*
diff --git a/docs/caching.md b/docs/caching.md
diff --git a/docs/visual_validation.md b/docs/visual_validation.md
@@ -0,0 +1,160 @@
+# Visual Validation for Caching
+
+> **Status**: ✅ Implemented
+> **Version**: v0.2
+> **Last Updated**: 2025-12-30
+
+## Overview
+
+Visual validation verifies that the UI state matches expectations before executing cached trajectory steps. This significantly improves cache reliability by detecting UI changes that would cause cached actions to fail.
+
+The system stores visual representations (perceptual hashes) of UI regions where actions like clicks are executed. During cache execution, these hashes are compared with the current UI state to detect changes.
+
+## How are the visual Representations computed
+
+We can think of multiple methods, e.g. aHash, pHash, ...
+
+**Key Necessary Properties:**
+- Fast computation (~1-2ms per hash)
+- Small storage footprint (64 bits = 8 bytes)
+- Robust to minor changes (compression, scaling, lighting)
+- Sensitive to structural changes (moved buttons, different layouts)
+
+Which method was used will be added to the metadata field of the cached trajectory.
+
+
+## How It Works
+
+### 1. Representation Storage
+
+When a trajectory is recorded and cached, visual representations will be captured for critical steps:
+
+```json
+{
+  "type": "tool_use",
+  "name": "computer",
+  "input": {
+    "action": "left_click",
+    "coordinate": [450, 300]
+  },
+  "visual_representation": "a8f3c9e14b7d2056"
+}
+```
+
+**Which steps should be validated?**
+- Mouse clicks (left_click, right_click, double_click, middle_click)
+- Type actions (verify input field hasn't moved)
+- Key presses targeting specific UI elements
+
+**Hash region selection:**
+- For clicks: Capture region around click coordinate (e.g., 100x100px centered on target)
+- For type actions: Capture region around text input field (e.g., 100x100px centered on target)
+
+### 2. Hash Verification (During Cache Execution)
+
+Before executing each step that has a `visual_representation`:
+
+1. **Capture current screen region** at the same coordinates used during recording
+2. **Compute visual Representation, e.g. aHash** of the current region
+3. **Compare with stored hash** using Hamming distance
+4. **Make decision** based on threshold:
+
+```python
+def should_validate_step(stored_hash: str, current_screen: Image, threshold: int = 10) -> bool:
+    """
+    Check if visual validation passes.
+
+    Args:
+        stored_hash: The aHash stored in the cache
+        current_screen: Current screenshot region
+        threshold: Maximum Hamming distance (0-64)
+            - 0-5: Nearly identical (recommended for strict validation)
+            - 6-10: Very similar (default - allows minor changes)
+            - 11-15: Similar (more lenient)
+            - 16+: Different (validation should fail)
+
+    Returns:
+        True if validation passes, False if UI has changed significantly
+    """
+    current_hash = compute_ahash(current_screen)
+    distance = hamming_distance(stored_hash, current_hash)
+    return distance <= threshold
+```
+
+### 3. Validation Results
+
+**If validation passes** (distance ≤ threshold):
+- ✅ Execute the cached step normally
+- Continue with trajectory execution
+
+**If validation fails** (distance > threshold):
+- ⚠️ Pause trajectory execution
+- Return control to agent with detailed information:
+  ```
+  Visual validation failed at step 5 (left_click at [450, 300]) as the UI region has changed significantly as compared to during recording time.
+  Please  Inspect the current UI state and perform the necessary step.
+  ```
+
+## Configuration
+
+Visual validation is configured in the Cache Settings:
+
+```python
+# In settings
+class CachingSettings:
+    visual_verification_method: CACHING_VISUAL_VERIFICATION_METHOD = "phash"  # or "ahash", "none"
+
+class CachedExecutionToolSettings:
+    visual_validation_threshold: int = 10  # Hamming distance threshold (0-64)
+```
+
+**Configuration Options:**
+- `visual_verification_method`: Hash method to use
+  - `"phash"` (default): Perceptual hash - robust to minor changes, sensitive to structural changes
+  - `"ahash"`: Average hash - faster but less robust
+  - `"none"`: Disable visual validation
+- `visual_validation_threshold`: Maximum allowed Hamming distance (0-64)
+  - `0-5`: Nearly identical (strict validation)
+  - `6-10`: Very similar (default - recommended)
+  - `11-15`: Similar (lenient)
+  - `16+`: Different (likely to fail validation)
+
+
+## Benefits
+
+### 1. Improved Reliability
+- Detect UI changes before execution fails
+- Reduce cache invalidation due to false negatives
+- Provide early warning of UI state mismatches
+
+### 2. Better User Experience
+- Agent can make informed decisions about cache validity
+- Clear feedback when UI has changed
+- Opportunity to adapt instead of failing
+
+### 3. Intelligent Cache Management
+- Automatically identify outdated caches
+- Track which UI regions are stable vs. volatile
+- Optimize cache usage patterns
+
+## Limitations and Considerations
+
+### 1. Performance Impact
+- Each validation requires a screenshot + hash computation (~5-10ms)
+- May slow down trajectory execution
+- Mitigation: Only validate critical steps, not every action
+
+### 2. False Positives
+- Minor UI changes (animations, hover states) may trigger validation failures
+- Threshold tuning required for different applications
+- Mitigation: Adaptive thresholds, ignore transient changes
+
+### 3. False Negatives
+- Subtle but critical changes might not be detected
+- Text content changes may not affect visual hash significantly
+- Mitigation: Combine with other validation methods (OCR, element detection)
+
+### 4. Storage Overhead
+- Each validated step adds 8 bytes (visual_hash) + 1 byte (flag)
+- A 100-step trajectory adds ~900 bytes
+- Mitigation: Acceptable overhead for improved reliability
diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,186 @@
+# AskUI Caching Examples
+
+This directory contains example scripts demonstrating the capabilities of the AskUI caching system (v0.2).
+
+## Examples Overview
+
+### 1. `basic_caching_example.py`
+**Introduction to cache recording and execution**
+
+Demonstrates:
+- ✅ **Record mode**: Save a trajectory to a cache file
+- ✅ **Execute mode**: Replay a cached trajectory
+- ✅ **Both mode**: Try execute, fall back to record
+- ✅ **Cache parameters**: Dynamic value substitution with `{{parameter}}` syntax
+- ✅ **AI-based parameter detection**: Automatic identification of dynamic values
+
+**Best for**: Getting started with caching, understanding the basic workflow
+
+### 2. `visual_validation_example.py`
+**Visual UI state validation with perceptual hashing**
+
+Demonstrates:
+- ✅ **pHash validation**: Perceptual hashing (recommended, robust)
+- ✅ **aHash validation**: Average hashing (faster, simpler)
+- ✅ **Threshold tuning**: Adjusting strictness (0-64 range)
+- ✅ **Region size**: Controlling validation area (50-200 pixels)
+- ✅ **Disabling validation**: When to skip visual validation
+
+**Best for**: Understanding visual validation, tuning validation parameters for your use case
+
+## Quick Start
+
+1. **Install dependencies**:
+   ```bash
+   pdm install
+   ```
+
+2. **Run an example**:
+   ```bash
+   pdm run python examples/basic_caching_example.py
+   ```
+
+3. **Explore the cache files**:
+   ```bash
+   cat .askui_cache/basic_example.json
+   ```
+
+## Understanding the Examples
+
+### Basic Workflow
+
+```python
+# 1. Record a trajectory
+caching_settings = CachingSettings(
+    strategy="record",  # Save to cache
+    cache_dir=".askui_cache",
+    writing_settings=CacheWritingSettings(
+        filename="my_cache.json",
+        visual_verification_method="phash",
+    ),
+)
+
+# 2. Execute from cache
+caching_settings = CachingSettings(
+    strategy="execute",  # Replay from cache
+    cache_dir=".askui_cache",
+    execution_settings=CacheExecutionSettings(
+        delay_time_between_action=0.5,
+    ),
+)
+
+# 3. Both (recommended for development)
+caching_settings = CachingSettings(
+    strategy="both",  # Try execute, fall back to record
+    cache_dir=".askui_cache",
+    writing_settings=CacheWritingSettings(filename="my_cache.json"),
+    execution_settings=CacheExecutionSettings(),
+)
+```
+
+### Visual Validation Settings
+
+```python
+writing_settings=CacheWritingSettings(
+    visual_verification_method="phash",  # or "ahash" or "none"
+    visual_validation_region_size=100,   # 100x100 pixel region
+    visual_validation_threshold=10,      # Hamming distance (0-64)
+)
+```
+
+**Threshold Guidelines**:
+- `0-5`: Very strict (detects tiny changes)
+- `6-10`: Strict (recommended for stable UIs) ✅
+- `11-15`: Moderate (tolerates minor changes)
+- `16+`: Lenient (may miss significant changes)
+
+**Region Size Guidelines**:
+- `50`: Small, precise validation
+- `100`: Balanced (recommended default) ✅
+- `150-200`: Large, more context
+
+## Customizing Examples
+
+Each example can be customized by modifying:
+
+1. **The goal**: Change the task description
+2. **Cache settings**: Adjust validation parameters
+3. **Tools**: Add custom tools to the agent
+4. **Model**: Change the AI model (e.g., `model="askui/claude-sonnet-4-5-20250929"`)
+
+## Cache File Structure (v0.2)
+
+```json
+{
+  "metadata": {
+    "version": "0.1",
+    "created_at": "2025-01-15T10:30:00Z",
+    "goal": "Task description",
+    "visual_verification_method": "phash",
+    "visual_validation_region_size": 100,
+    "visual_validation_threshold": 10
+  },
+  "trajectory": [
+    {
+      "type": "tool_use",
+      "name": "computer",
+      "input": {"action": "left_click", "coordinate": [450, 320]},
+      "visual_representation": "80c0e3f3e3e7e381..."  // pHash/aHash
+    }
+  ],
+  "cache_parameters": {
+    "search_term": "Description of the parameter"
+  }
+}
+```
+
+## Tips and Best Practices
+
+### When to Use Caching
+
+✅ **Good use cases**:
+- Repetitive UI automation tasks
+- Testing workflows that require setup
+- Demos and presentations
+- Regression testing of UI workflows
+
+❌ **Not recommended**:
+- Highly dynamic UIs that change frequently
+- Tasks requiring real-time decision making
+- One-off tasks that won't be repeated
+
+### Choosing Validation Settings
+
+**For stable UIs** (e.g., desktop applications):
+- Method: `phash`
+- Threshold: `5-10`
+- Region: `100`
+
+**For dynamic UIs** (e.g., websites with ads):
+- Method: `phash`
+- Threshold: `15-20`
+- Region: `150`
+
+**For maximum performance** (trusted cache):
+- Method: `none`
+- (Visual validation disabled)
+
+### Debugging Cache Execution
+
+If cache execution fails:
+
+1. **Check visual validation**: Lower threshold or disable temporarily
+2. **Verify UI state**: Ensure UI hasn't changed since recording
+3. **Check cache file**: Look for `visual_representation` fields
+4. **Review logs**: Look for "Visual validation failed" messages
+5. **Re-record**: Delete cache file and record fresh trajectory
+
+## Additional Resources
+
+- **Documentation**: See `docs/caching.md` for complete documentation
+- **Visual Validation**: See `docs/visual_validation.md` for technical details
+- **Playground**: See `playground/caching_demo.py` for more examples
+
+## Questions?
+
+For issues or questions, please refer to the main documentation or open an issue in the repository.