This project implements an intelligent web navigation and automation system using a multi-model Large Language Model (LLM) architecture. It can understand user goals expressed in natural language, plan and execute web browsing tasks, interact with web elements, analyze network requests, and learn from its interactions.
The system employs a modular, multi-LLM approach inspired by the concepts outlined in plan.md. Each LLM specializes in a specific task:
- Task Planner (
Hermes-3- configured viaGENERAL_TEXT_MODELenv var): Receives the user's high-level goal and generates a detailed, step-by-step plan for web navigation and interaction. It considers the current browser state (URL, page content) and past API interactions. - Code Assistant (
Granite-Code- configured viaCODING_MODELenv var): Takes individual steps from the plan and generates the corresponding Python code (using Selenium) to execute that action in the browser (e.g., clicking buttons, filling forms). It also assists with fixing code errors and generating API replay scripts. - API Reverse Engineer (
DeepSeek-R1- configured viaREASONING_MODELenv var): Analyzes captured HTTP requests and responses during navigation to understand API endpoints, authentication, parameters, and potential automation opportunities. - Semantic Indexer (
mxbai-embed-large- configured viaEMBEDDING_MODELenv var): Creates vector embeddings of plans, actions, UI analyses, and captured API requests. This allows for semantic search over the system's memory to find relevant past interactions or similar patterns.
These models are coordinated by the Navigator Orchestrator (navigator/orchestrator.py), which manages the overall flow: Plan -> Execute (via Code Generation & Selenium) -> Analyze -> Learn (via Indexing).
Core Components:
navigator/: Contains the main orchestration and AI logic.orchestrator.py: The central coordinator.web_navigator.py: Manages browser interaction via Selenium and coordinates code generation/execution for plan steps.brain/: Houses the specialized LLM agents (Planner, Code Assistant, API Reverse Engineer).memory/: Handles semantic indexing and retrieval of learned information.
core/: Provides shared utilities and plugins.plugins/: Contains wrappers for external tools/APIs (Selenium Manager, Ollama Client).database/: Manages the SQLite database for storing learned information and API requests.
main.py: A FastAPI application providing an API interface to the Navigator.real_testing.py: A command-line script for testing the Navigator with specific prompts.
-
Clone the repository:
git clone https://github.com/Likhithsai2580/ai-navigator cd ai-navigator -
Install dependencies: This project uses
uvfor package management.# Install uv if you haven't already pip install uv # Create a virtual environment and install dependencies uv venv uv pip install -r requirements.txt # Activate the environment (example for bash/zsh) source .venv/bin/activate # (or .venv\Scripts\activate on Windows)
-
Set up Ollama:
- Ensure you have Ollama installed and running. See Ollama Website.
- Pull the required models (names can be configured via environment variables, defaults shown):
ollama pull hermes-3 # Or your GENERAL_TEXT_MODEL ollama pull granite-code:8b # Or your CODING_MODEL ollama pull deepseek-r1:8b # Or your REASONING_MODEL ollama pull mxbai-embed-large # Or your EMBEDDING_MODEL
- Verify the Ollama server is accessible from the project environment. The default URL used is
http://<ollama-host>:11434/api(checkcore/plugins/apis/ollama_client.py). If running Ollama in WSL and the project on Windows, ensure the correct IP address is used.
-
Environment Variables: Create a
.envfile in the project root or set environment variables directly. These configure the Ollama models used:# .env file # Replace with the exact names from 'ollama list' if different GENERAL_TEXT_MODEL=hermes-3 CODING_MODEL=granite-code:8b REASONING_MODEL=deepseek-r1:8b EMBEDDING_MODEL=mxbai-embed-large
-
Initialize Database: The project uses a SQLite database (
ultimate_ai.db). It should be created automatically, but you can ensure it's set up by running:uv run python init_db.py
There are two main ways to interact with the system:
-
Command-Line Testing (
real_testing.py): Use this script for quick tests with a specific goal.uv run python real_testing.py --prompt "Your web goal here" # Example: uv run python real_testing.py --prompt "Search DuckDuckGo for recent AI news"
This will run the full orchestration process for the given prompt.
-
FastAPI Application (
main.py): Provides a web API for more structured interaction.# Start the FastAPI server uv run uvicorn main:app --reloadAccess the API documentation at
http://127.0.0.1:8000/docs. Key endpoints include:/navigate(POST): Starts a background task to achieve a web goal./generate(POST): Simple text generation via Ollama./api/search(GET): Search the memory for stored API patterns./memory/search(GET): Perform a semantic search over the learned memory.
See future_plans.md for planned improvements and features.