AgentFlow is a Python-based framework for simulating and evaluating a multi-agent customer-support flow. It compares flawed (v0) and improved (v1) routing/agent prompts, logs conversations, computes quality metrics, and provides artifacts (analysis + visualization) to demonstrate improvements.
- 🎯 Router + specialized agents (Search, Policy, Complaint, Booking, Closer) with v0/v1 prompt sets
- 🧪 Conversation simulation against predefined scenarios (5 conversations, 4–6 turns each)
- 📊 Metrics for routing accuracy, flow adherence, tool-call correctness, latency, specialization, and more
- 🔍 Failure analysis highlighting v0 issues and v1 fixes
- 🖼️ Visual flow diagram (HTML) and prompt analysis report
main.py— runs simulations for v0 vs v1 prompts, logs turns, computes metrics, prints summaries.Documentation/ANALYSIS.md— detailed failure patterns, improved prompts, and expected improvements.Documentation/index.html— visual flow/architecture of the router + sub-agents.requirements.txt— Python dependencies.visualizer.py— optional graph visualizer (mermaid PNG) usingchatbot.workflow(requiresv4andsrc.graph_compilerwhich are not included in this snapshot).utils/cli.py— simple colored CLI print helpers.readme.md— original minimal README (superseded by this document).
- Python 3.10+ recommended
- pip (or uv/poetry) for dependency management
git clone https://github.com/ReWar1311/AgentFlow.git
cd AgentFlow
python -m venv .venv && source .venv/bin/activate # or use your preferred venv
pip install -r requirements.txtThe sample code in main.py currently uses a hardcoded API key and endpoint:
API_ENDPOINT = "https://research-interns.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview"
API_KEY = "cc3b2035419042a381b6d95df5585085"You should replace these with your own secure values (e.g., via environment variables) before running:
export OPENAI_API_KEY="your-key"
export OPENAI_API_ENDPOINT="https://your-endpoint/..."Then update main.py to read from environment variables instead of the hardcoded values.
python main.pyWhat happens:
- Simulates 5 conversation scenarios for v0 (flawed) and v1 (improved) prompts.
- Prints routed agent, responses, tool calls, and per-turn latencies.
- Computes and prints 10 metrics (7 required + 3 creative).
- Exports logs/metrics objects in-memory (printed in the console).
- Prompt and failure analysis: see
Documentation/ANALYSIS.md. - Flow visualization (static HTML): open
Documentation/index.htmlin a browser. - Mermaid PNG visualization (optional):
python visualizer.py graph # may require missing modules (v4, src.graph_compiler) to be present
- The visualizer references
v4.chatbotandsrc.graph_compiler, which are not included in this snapshot; ensure those modules exist or adjust imports accordingly. - The hardcoded API key in
main.pyshould be removed/secured before sharing or deploying. - No automated tests are included; add unit tests if you extend the routing or agent logic.
- Externalize configuration (API keys/endpoints, model params) via
.envor CLI flags. - Add test coverage for routing accuracy and tool-call discipline.
- Parameterize scenarios and metrics export (JSON/CSV) for downstream analysis.
- Bundle a CLI/Streamlit demo for interactive runs.
(Please add a LICENSE file to clarify usage.)