Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
0165cee
feat(caching): major caching refactoring that adds caching v0.1 funct…
philipph-askui Dec 12, 2025
93564dc
chore(caching): add more configurability and fixes some tests
philipph-askui Dec 12, 2025
a56cf2f
fix(caching): fix unit tests for cache writer
philipph-askui Dec 12, 2025
11c67bb
fix(caching): fix unit tests for trajectory executor
philipph-askui Dec 15, 2025
7e969c8
chore(caching): pass CacheExecutionManager to caching tools i/o the A…
philipph-askui Dec 15, 2025
35b4b8c
chore(caching): cleanup _step method
philipph-askui Dec 15, 2025
bd85a73
chore(caching): rename `cache_manager` to `cache_execution_manager` t…
philipph-askui Dec 15, 2025
41ef97b
fix(caching): fixes bug when delegating execution step back to agent
philipph-askui Dec 15, 2025
e894073
chore(caching): cleanup imports
philipph-askui Dec 15, 2025
7b844db
fix(caching): make cached steps appear under the correct role in repo…
philipph-askui Dec 15, 2025
34effa9
chore(caching): pdm format
philipph-askui Dec 15, 2025
4068552
chore(caching): make CacheManager API for validators consistent with …
philipph-askui Dec 15, 2025
fe026e6
feat(caching): adds example to caching docs
philipph-askui Dec 15, 2025
f1d2d36
chore(caching): explicitly mention that caching is experimental in th…
philipph-askui Dec 15, 2025
00795ad
feat(caching): add token usage to cache writer and reporters
philipph-askui Dec 15, 2025
96e90b8
fix(caching): fix pdm typecheck in tests
philipph-askui Dec 15, 2025
f1cdc8e
fix(caching): fix linting errors
philipph-askui Dec 16, 2025
cacceb5
fix(caching): add missing usage statistics and fix typos in docs
philipph-askui Dec 16, 2025
1c89c6f
chore(caching): move caching utils to caching sub folder
philipph-askui Dec 16, 2025
e1cbadf
chore(caching): removes unnecessary docstrings from CacheManager clas…
philipph-askui Dec 16, 2025
42928c6
chore(caching): removes CacheMigration class. Cache migration will st…
philipph-askui Dec 16, 2025
9d94a7b
chore(caching): distributes responsibilities between CacheWriter and …
philipph-askui Dec 16, 2025
5a2cd2d
fix(caching): fix parameter identification when using non-askui messa…
philipph-askui Dec 18, 2025
10335e6
fix(caching): refine CACHE_USE_PROMPT to prevent interference of othe…
philipph-askui Dec 22, 2025
1581c3e
fix(caching): refine verfication_request message from cacheExecutionM…
philipph-askui Dec 24, 2025
617ee38
feat(caching)!: add caching v0.2 features, including visual validation
philipph-askui Dec 30, 2025
ad9ae22
feat(caching): add examples
philipph-askui Dec 31, 2025
2990031
chore(caching): refactor tests to use new caching api
philipph-askui Dec 31, 2025
f32f661
chore(caching): fix formatting and linting issues
philipph-askui Dec 31, 2025
9099c14
fix(caching): fix bug that caused a crash when parameters contained c…
philipph-askui Jan 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -166,4 +166,6 @@ reports/
/chat
/askui_chat.db
.cache/
.askui_cache/

playground/*
1,261 changes: 1,152 additions & 109 deletions docs/caching.md

Large diffs are not rendered by default.

160 changes: 160 additions & 0 deletions docs/visual_validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Visual Validation for Caching

> **Status**: ✅ Implemented
> **Version**: v0.2
> **Last Updated**: 2025-12-30

## Overview

Visual validation verifies that the UI state matches expectations before executing cached trajectory steps. This significantly improves cache reliability by detecting UI changes that would cause cached actions to fail.

The system stores visual representations (perceptual hashes) of UI regions where actions like clicks are executed. During cache execution, these hashes are compared with the current UI state to detect changes.

## How are the visual Representations computed

We can think of multiple methods, e.g. aHash, pHash, ...

**Key Necessary Properties:**
- Fast computation (~1-2ms per hash)
- Small storage footprint (64 bits = 8 bytes)
- Robust to minor changes (compression, scaling, lighting)
- Sensitive to structural changes (moved buttons, different layouts)

Which method was used will be added to the metadata field of the cached trajectory.


## How It Works

### 1. Representation Storage

When a trajectory is recorded and cached, visual representations will be captured for critical steps:

```json
{
"type": "tool_use",
"name": "computer",
"input": {
"action": "left_click",
"coordinate": [450, 300]
},
"visual_representation": "a8f3c9e14b7d2056"
}
```

**Which steps should be validated?**
- Mouse clicks (left_click, right_click, double_click, middle_click)
- Type actions (verify input field hasn't moved)
- Key presses targeting specific UI elements

**Hash region selection:**
- For clicks: Capture region around click coordinate (e.g., 100x100px centered on target)
- For type actions: Capture region around text input field (e.g., 100x100px centered on target)

### 2. Hash Verification (During Cache Execution)

Before executing each step that has a `visual_representation`:

1. **Capture current screen region** at the same coordinates used during recording
2. **Compute visual Representation, e.g. aHash** of the current region
3. **Compare with stored hash** using Hamming distance
4. **Make decision** based on threshold:

```python
def should_validate_step(stored_hash: str, current_screen: Image, threshold: int = 10) -> bool:
"""
Check if visual validation passes.

Args:
stored_hash: The aHash stored in the cache
current_screen: Current screenshot region
threshold: Maximum Hamming distance (0-64)
- 0-5: Nearly identical (recommended for strict validation)
- 6-10: Very similar (default - allows minor changes)
- 11-15: Similar (more lenient)
- 16+: Different (validation should fail)

Returns:
True if validation passes, False if UI has changed significantly
"""
current_hash = compute_ahash(current_screen)
distance = hamming_distance(stored_hash, current_hash)
return distance <= threshold
```

### 3. Validation Results

**If validation passes** (distance ≤ threshold):
- ✅ Execute the cached step normally
- Continue with trajectory execution

**If validation fails** (distance > threshold):
- ⚠️ Pause trajectory execution
- Return control to agent with detailed information:
```
Visual validation failed at step 5 (left_click at [450, 300]) as the UI region has changed significantly as compared to during recording time.
Please Inspect the current UI state and perform the necessary step.
```

## Configuration

Visual validation is configured in the Cache Settings:

```python
# In settings
class CachingSettings:
visual_verification_method: CACHING_VISUAL_VERIFICATION_METHOD = "phash" # or "ahash", "none"

class CachedExecutionToolSettings:
visual_validation_threshold: int = 10 # Hamming distance threshold (0-64)
```

**Configuration Options:**
- `visual_verification_method`: Hash method to use
- `"phash"` (default): Perceptual hash - robust to minor changes, sensitive to structural changes
- `"ahash"`: Average hash - faster but less robust
- `"none"`: Disable visual validation
- `visual_validation_threshold`: Maximum allowed Hamming distance (0-64)
- `0-5`: Nearly identical (strict validation)
- `6-10`: Very similar (default - recommended)
- `11-15`: Similar (lenient)
- `16+`: Different (likely to fail validation)


## Benefits

### 1. Improved Reliability
- Detect UI changes before execution fails
- Reduce cache invalidation due to false negatives
- Provide early warning of UI state mismatches

### 2. Better User Experience
- Agent can make informed decisions about cache validity
- Clear feedback when UI has changed
- Opportunity to adapt instead of failing

### 3. Intelligent Cache Management
- Automatically identify outdated caches
- Track which UI regions are stable vs. volatile
- Optimize cache usage patterns

## Limitations and Considerations

### 1. Performance Impact
- Each validation requires a screenshot + hash computation (~5-10ms)
- May slow down trajectory execution
- Mitigation: Only validate critical steps, not every action

### 2. False Positives
- Minor UI changes (animations, hover states) may trigger validation failures
- Threshold tuning required for different applications
- Mitigation: Adaptive thresholds, ignore transient changes

### 3. False Negatives
- Subtle but critical changes might not be detected
- Text content changes may not affect visual hash significantly
- Mitigation: Combine with other validation methods (OCR, element detection)

### 4. Storage Overhead
- Each validated step adds 8 bytes (visual_hash) + 1 byte (flag)
- A 100-step trajectory adds ~900 bytes
- Mitigation: Acceptable overhead for improved reliability
186 changes: 186 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# AskUI Caching Examples

This directory contains example scripts demonstrating the capabilities of the AskUI caching system (v0.2).

## Examples Overview

### 1. `basic_caching_example.py`
**Introduction to cache recording and execution**

Demonstrates:
- ✅ **Record mode**: Save a trajectory to a cache file
- ✅ **Execute mode**: Replay a cached trajectory
- ✅ **Both mode**: Try execute, fall back to record
- ✅ **Cache parameters**: Dynamic value substitution with `{{parameter}}` syntax
- ✅ **AI-based parameter detection**: Automatic identification of dynamic values

**Best for**: Getting started with caching, understanding the basic workflow

### 2. `visual_validation_example.py`
**Visual UI state validation with perceptual hashing**

Demonstrates:
- ✅ **pHash validation**: Perceptual hashing (recommended, robust)
- ✅ **aHash validation**: Average hashing (faster, simpler)
- ✅ **Threshold tuning**: Adjusting strictness (0-64 range)
- ✅ **Region size**: Controlling validation area (50-200 pixels)
- ✅ **Disabling validation**: When to skip visual validation

**Best for**: Understanding visual validation, tuning validation parameters for your use case

## Quick Start

1. **Install dependencies**:
```bash
pdm install
```

2. **Run an example**:
```bash
pdm run python examples/basic_caching_example.py
```

3. **Explore the cache files**:
```bash
cat .askui_cache/basic_example.json
```

## Understanding the Examples

### Basic Workflow

```python
# 1. Record a trajectory
caching_settings = CachingSettings(
strategy="record", # Save to cache
cache_dir=".askui_cache",
writing_settings=CacheWritingSettings(
filename="my_cache.json",
visual_verification_method="phash",
),
)

# 2. Execute from cache
caching_settings = CachingSettings(
strategy="execute", # Replay from cache
cache_dir=".askui_cache",
execution_settings=CacheExecutionSettings(
delay_time_between_action=0.5,
),
)

# 3. Both (recommended for development)
caching_settings = CachingSettings(
strategy="both", # Try execute, fall back to record
cache_dir=".askui_cache",
writing_settings=CacheWritingSettings(filename="my_cache.json"),
execution_settings=CacheExecutionSettings(),
)
```

### Visual Validation Settings

```python
writing_settings=CacheWritingSettings(
visual_verification_method="phash", # or "ahash" or "none"
visual_validation_region_size=100, # 100x100 pixel region
visual_validation_threshold=10, # Hamming distance (0-64)
)
```

**Threshold Guidelines**:
- `0-5`: Very strict (detects tiny changes)
- `6-10`: Strict (recommended for stable UIs) ✅
- `11-15`: Moderate (tolerates minor changes)
- `16+`: Lenient (may miss significant changes)

**Region Size Guidelines**:
- `50`: Small, precise validation
- `100`: Balanced (recommended default) ✅
- `150-200`: Large, more context

## Customizing Examples

Each example can be customized by modifying:

1. **The goal**: Change the task description
2. **Cache settings**: Adjust validation parameters
3. **Tools**: Add custom tools to the agent
4. **Model**: Change the AI model (e.g., `model="askui/claude-sonnet-4-5-20250929"`)

## Cache File Structure (v0.2)

```json
{
"metadata": {
"version": "0.1",
"created_at": "2025-01-15T10:30:00Z",
"goal": "Task description",
"visual_verification_method": "phash",
"visual_validation_region_size": 100,
"visual_validation_threshold": 10
},
"trajectory": [
{
"type": "tool_use",
"name": "computer",
"input": {"action": "left_click", "coordinate": [450, 320]},
"visual_representation": "80c0e3f3e3e7e381..." // pHash/aHash
}
],
"cache_parameters": {
"search_term": "Description of the parameter"
}
}
```

## Tips and Best Practices

### When to Use Caching

✅ **Good use cases**:
- Repetitive UI automation tasks
- Testing workflows that require setup
- Demos and presentations
- Regression testing of UI workflows

❌ **Not recommended**:
- Highly dynamic UIs that change frequently
- Tasks requiring real-time decision making
- One-off tasks that won't be repeated

### Choosing Validation Settings

**For stable UIs** (e.g., desktop applications):
- Method: `phash`
- Threshold: `5-10`
- Region: `100`

**For dynamic UIs** (e.g., websites with ads):
- Method: `phash`
- Threshold: `15-20`
- Region: `150`

**For maximum performance** (trusted cache):
- Method: `none`
- (Visual validation disabled)

### Debugging Cache Execution

If cache execution fails:

1. **Check visual validation**: Lower threshold or disable temporarily
2. **Verify UI state**: Ensure UI hasn't changed since recording
3. **Check cache file**: Look for `visual_representation` fields
4. **Review logs**: Look for "Visual validation failed" messages
5. **Re-record**: Delete cache file and record fresh trajectory

## Additional Resources

- **Documentation**: See `docs/caching.md` for complete documentation
- **Visual Validation**: See `docs/visual_validation.md` for technical details
- **Playground**: See `playground/caching_demo.py` for more examples

## Questions?

For issues or questions, please refer to the main documentation or open an issue in the repository.
Loading