-
Notifications
You must be signed in to change notification settings - Fork 48
feat: implement auto-injection of AgentOS into tools #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement auto-injection of AgentOS into tools #220
Conversation
- Add ToolWithAgentOS base class for tools requiring AgentOS access - Create ComputerBaseTool and AndroidBaseTool with platform-specific typing - Add ToolCollection.add_agent_os() for automatic AgentOS injection - Modularize computer tools: split computer.py into individual modules - Refactor Android tools to use AndroidBaseTool - Add ComputerAgentOsFacade for coordinate scaling functionality - Add geometry types module for better type safety - Update all agent classes (VisionAgent, AndroidAgent, WebAgent) to use new tool structure with auto-injection - Update MCP server implementations for new tool architecture This refactoring enables automatic AgentOS injection into tools through base classes, eliminating manual agent_os parameter passing and improving code maintainability and tool composition.
17a5546 to
3b263d3
Compare
| if all(tag in agent_os.tags for tag in tags): | ||
| return agent_os | ||
| msg = f"Agent OS with tags [{', '.join(tags)}] not found" | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to consider defining a custom "AskUIException" or "AskUIAgentOSException" for things like that
philipph-askui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think my comments on the code are only minor. I am a great fan of the new abstraction of ComputerTools, AndroidTools, and the AgentOSFacade. Besides my comments on the code, in my opinion there is a few things that need to be changed / done before we can merge this into main:
- Docs: we definitely need to explain the abstractions and concepts in the docs. 3 major concepts introduced here that need to be explained in my opinion are 1) the agent_os_facade, especially with focus on the different coordinate systems, 2) the tags that can/should be used for the tools. I saw "android", "computer", "universal" (which would make sense) and "agent_os_facade" (which would not make so much sense for me), and 3) the tool_store, with focus on how people can import tools from there and what it is for.
- Benchmarking: We need to assure that the overall performance does not get worse with switching from a "one-tool-architecture" to a "multi-tool-architecture" and removing the beta flags.
- Testing: I think there should be new (unit and e2e) tests for the new Computer/Android tools, and especially for the tools in the new tools_store.
mlikasam-askui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
68e22d8 to
a97517b
Compare
a97517b to
20bffd9
Compare
Refactor: Tools Auto-Inject Agent OS
Overview
This PR refactors the tool system to automatically inject
agent_osinstances into tools through a centralizedToolCollection, eliminating the need to manually passagent_osto each tool during initialization. The refactoring also splits the monolithiccomputer.pytool file into individual, focused tool files for better maintainability.Key Changes
1. Tool Architecture Refactoring
Split Monolithic Computer Tool
Computer20250124Toolclass handling all computer interactionsComputerScreenshotToolComputerMouseClickToolComputerMoveMouseToolComputerKeyboardTapToolComputerTypeToolComputerListDisplaysToolComputerSetActiveDisplayToolComputerRetrieveActiveDisplayToolComputerMouseHoldDownToolComputerMouseReleaseToolComputerMouseScrollToolComputerKeyboardPressedToolComputerKeyboardReleaseToolComputerGetMousePositionToolComputerReconnectToolComputerDisconnectToolAll tools are now located in
src/askui/tools/computer/directory with a dedicated file per tool.Base Tool Classes
ComputerBaseTool(src/askui/models/shared/computer_base_tool.py) - base class for all computer toolsAndroidBaseTool(src/askui/models/shared/android_base_tool.py) - base class for all Android toolsToolWithAgentOSand provide type-safe access to their respectiveagent_osimplementations2. Automatic Agent OS Injection
ToolCollection Enhancement
ToolCollectionnow maintains a list ofagent_osinstances viaadd_agent_os()agent_osbased on matchingrequired_tagsduring initializationrequired_tags(e.g.,["computer", "agent_os_facade"]) to match the appropriateagent_osAgent OS Facade
ComputerAgentOsFacade(src/askui/tools/computer_agent_os_facade.py)AgentOsto provide coordinate scaling functionality3. Agent Base Refactoring
Simplified Tool Management
_get_default_tools_for_act(),_get_default_settings_for_act(), and_get_default_caching_settings_for_act()methodsself.act_tool_collection- pre-initialized tool collectionself.act_settings- default act settingsself.caching_settings- default caching settingsadd_agent_os()after initializationUpdated Tool Building Logic
_build_tools()now works withToolCollectioninstances directly4. Vision Agent Changes
Tool Initialization
Computer20250124Tooland beta flagsComputerAgentOsFacadeand adds it toact_tool_collectionviaadd_agent_os()act_settingswith system prompt and thinking configurationRemoved Beta Flag Logic
_get_default_settings_for_act()method that handled different beta flags for different Claude model versions5. Android Agent Changes
Tool Initialization
agent_osparameter in constructoragent_osand receive it automatically viaToolCollectionact_agent_os_facadeis added toact_tool_collectionafter initializationact_settingsare set with Android-specific system prompt6. MCP Server Refactoring
Computer MCP Server (
src/askui/chat/api/mcp_servers/computer.py)computer()tool with individual tool functionscomputer_screenshot,computer_mouse_click)mcp.add_tool()with proper tagsactive_displayvariable and resolution constantsComputerAgentOsFacadeinstance for all toolsAssistant Seeds Update
COMPUTER_AGENT_V1assistant configuration to use new individual tool names["computer", "list_displays", "set_active_display", "retrieve_active_display"]with the full list of 16 individual computer tools7. Type System Reorganization
Geometry Types
PointandPointListfromaskui.models.modelstoaskui.models.types.geometry8. File Structure Changes
New Files
src/askui/tools/computer/__init__.py- exports all computer toolssrc/askui/tools/computer/*_tool.py- individual tool implementations (16 files)src/askui/tools/computer_agent_os_facade.py- coordinate scaling facadesrc/askui/models/shared/computer_base_tool.py- base class for computer toolssrc/askui/models/shared/android_base_tool.py- base class for Android toolssrc/askui/models/types/geometry.py- geometry type definitionsRemoved Files
src/askui/tools/computer.py- replaced by individual tool filessrc/askui/tools/list_displays_tool.py- moved tocomputer/list_displays_tool.pysrc/askui/tools/retrieve_active_display_tool.py- moved tocomputer/retrieve_active_display_tool.pyMoved Files
src/askui/tools/set_active_display_tool.py→src/askui/tools/computer/set_active_display_tool.py9. Minor Changes
timeout_graceful_shutdown=5to FastAPI uvicorn configuration inchat/__main__.pyagent_osparameter from constructorsBenefits
agent_osbased on tags, reducing boilerplateagent_osimplementationsComputerAgentOsFacadeensures all computer tools handle coordinate scaling consistentlyMigration Notes
agent_osin constructor can now be initialized without itagent_oswill be automatically injected byToolCollectionbased onrequired_tagsComputerBaseToolorAndroidBaseTooland specify appropriaterequired_tagsPointandPointListhave changed toaskui.models.types.geometryTesting