Skip to content

A Python framework that simulates and evaluates a multi-agent routing system (router + specialized agents) with prompt iterations, metrics, and visual flow artifacts.

Notifications You must be signed in to change notification settings

ReWar1311/AgentFlow

Repository files navigation

AgentFlow

AgentFlow is a Python-based framework for simulating and evaluating a multi-agent customer-support flow. It compares flawed (v0) and improved (v1) routing/agent prompts, logs conversations, computes quality metrics, and provides artifacts (analysis + visualization) to demonstrate improvements.

Key Features

  • 🎯 Router + specialized agents (Search, Policy, Complaint, Booking, Closer) with v0/v1 prompt sets
  • 🧪 Conversation simulation against predefined scenarios (5 conversations, 4–6 turns each)
  • 📊 Metrics for routing accuracy, flow adherence, tool-call correctness, latency, specialization, and more
  • 🔍 Failure analysis highlighting v0 issues and v1 fixes
  • 🖼️ Visual flow diagram (HTML) and prompt analysis report

Repository Structure

  • main.py — runs simulations for v0 vs v1 prompts, logs turns, computes metrics, prints summaries.
  • Documentation/ANALYSIS.md — detailed failure patterns, improved prompts, and expected improvements.
  • Documentation/index.html — visual flow/architecture of the router + sub-agents.
  • requirements.txt — Python dependencies.
  • visualizer.py — optional graph visualizer (mermaid PNG) using chatbot.workflow (requires v4 and src.graph_compiler which are not included in this snapshot).
  • utils/cli.py — simple colored CLI print helpers.
  • readme.md — original minimal README (superseded by this document).

Prerequisites

  • Python 3.10+ recommended
  • pip (or uv/poetry) for dependency management

Installation

git clone https://github.com/ReWar1311/AgentFlow.git
cd AgentFlow
python -m venv .venv && source .venv/bin/activate   # or use your preferred venv
pip install -r requirements.txt

Configuration

The sample code in main.py currently uses a hardcoded API key and endpoint:

API_ENDPOINT = "https://research-interns.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview"
API_KEY = "cc3b2035419042a381b6d95df5585085"

You should replace these with your own secure values (e.g., via environment variables) before running:

export OPENAI_API_KEY="your-key"
export OPENAI_API_ENDPOINT="https://your-endpoint/..."

Then update main.py to read from environment variables instead of the hardcoded values.

Running the Evaluation

python main.py

What happens:

  • Simulates 5 conversation scenarios for v0 (flawed) and v1 (improved) prompts.
  • Prints routed agent, responses, tool calls, and per-turn latencies.
  • Computes and prints 10 metrics (7 required + 3 creative).
  • Exports logs/metrics objects in-memory (printed in the console).

Artifacts & Documentation

  • Prompt and failure analysis: see Documentation/ANALYSIS.md.
  • Flow visualization (static HTML): open Documentation/index.html in a browser.
  • Mermaid PNG visualization (optional):
    python visualizer.py graph        # may require missing modules (v4, src.graph_compiler) to be present

Notes & Limitations

  • The visualizer references v4.chatbot and src.graph_compiler, which are not included in this snapshot; ensure those modules exist or adjust imports accordingly.
  • The hardcoded API key in main.py should be removed/secured before sharing or deploying.
  • No automated tests are included; add unit tests if you extend the routing or agent logic.

Roadmap Ideas

  • Externalize configuration (API keys/endpoints, model params) via .env or CLI flags.
  • Add test coverage for routing accuracy and tool-call discipline.
  • Parameterize scenarios and metrics export (JSON/CSV) for downstream analysis.
  • Bundle a CLI/Streamlit demo for interactive runs.

License

(Please add a LICENSE file to clarify usage.)

About

A Python framework that simulates and evaluates a multi-agent routing system (router + specialized agents) with prompt iterations, metrics, and visual flow artifacts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages