A local, Rust-based LLM proxy with zero-latency bidirectional streaming, full logging, and live context injection.
- Multi-Provider Support: OpenAI, Anthropic, Google, Mistral, Groq, Cerebras, xAI, OpenRouter, Azure, AWS Bedrock, and any OpenAI-compatible API (Ollama, vLLM, LM Studio)
- Transparent Traffic Capture: Automatically intercept LLM API calls from any app via mitmproxy integration
- Virtual API Keys: Issue keys with rate limits, budgets, and model restrictions
- Cost Tracking: Automatic token counting and cost calculation per key
- Transparent Proxy: Forwards requests with zero latency
- Live Logging: Multiple backends (stdout, file, webhook, OpenTelemetry)
- Context Injection: Pre-request injection of system or user messages
- Conversation State: TTL-based state management with automatic cleanup
- Control API: Manage injections, conversations, and stream logs in real-time
# Set your API key
export OPENAI_API_KEY=your_key_here
# Run the server (foreground)
eavs serve
# Or run as a background service
eavs service start
# Test with curl
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'EAVS uses TOML configuration. It looks for config files in:
--config PATH/EAVS_CONFIG(explicit config file)EAVS_*environment variables (e.g.EAVS_SERVER__PORT=3001)./eavs.toml(current directory, overrides global)$XDG_CONFIG_HOME/eavs/config.toml(or~/.config/eavs/config.toml, auto-created on first run)
See config/config.example.toml for a fully documented example configuration. A JSON schema is available at config/config.schema.json for editor validation and autocompletion.
Configure multiple providers and select at runtime via X-Provider header:
[providers.default]
type = "openai"
api_key = "env:OPENAI_API_KEY"
[providers.anthropic]
type = "anthropic"
api_key = "env:ANTHROPIC_API_KEY"
[providers.local]
type = "ollama"
base_url = "http://localhost:11434/v1"Supported providers:
openai- OpenAI APIopenai-responses- OpenAI Responses API (same keys, /v1/responses)openai-codex- OpenAI Codex/ChatGPT backend via OAuthanthropic- Anthropic Claudegoogle- Google Geminimistral- Mistral AIgroq- Groq (fast inference)cerebras- Cerebrasxai- xAI (Grok)openrouter- OpenRouterazure- Azure OpenAIbedrock- AWS Bedrock (with SigV4 signing)ollama,vllm,openai-compatible- Local/compatible APIsmock- Mock provider for testing (no network calls)
When to use which OpenAI provider:
openaifor the Chat Completions API (/v1/chat/completions).openai-responsesfor the Responses API (/v1/responses).openai-codexfor the Codex/ChatGPT backend with OAuth tokens.
Provider shortcuts (no client changes required):
eavs provider use <name>sets a runtime default for the auto endpoint.eavs provider clearresets to the config default.- State is stored in XDG state at
~/.local/state/eavs/state.toml(or$XDG_STATE_HOME).
Configure multiple logging backends:
[logging]
default = "stdout"
[[logging.backends]]
type = "stdout"
format = "json" # or "pretty"
[[logging.backends]]
type = "file"
path = "./logs/eavs.jsonl"
rotate = "daily"
[[logging.backends]]
type = "webhook"
url = "https://your-service.com/logs"
headers = { Authorization = "env:LOG_API_KEY" }
batch_size = 100
flush_interval_secs = 5[state]
enabled = true
ttl_secs = 3600 # 1 hour TTL
cleanup_interval_secs = 60 # Cleanup every minute
max_conversations = 10000 # Max concurrent conversationsIssue API keys with rate limits, budgets, and model restrictions:
[keys]
enabled = true
database_path = "~/.eavs/keys.db"
master_key = "env:EAVS_MASTER_KEY" # Required for admin operations
default_rpm_limit = 60 # Requests per minute
default_budget_usd = 10.0 # Budget in USDManage keys via CLI:
# Create a key with limits
eavs key create --name "dev-key" --rpm 100 --budget 50.0
# Create a key with model restrictions
eavs key create --name "gpt4-only" --models "gpt-4,gpt-4-turbo"
# List all keys
eavs key list
# Check usage
eavs key usage <key-id>
# Revoke a key
eavs key revoke <key-id>
# Bind a key to an OAuth user (Claude Code / Codex)
eavs key bind <key-id> --oauth-user "<user-id>"
# Clear the OAuth binding
eavs key bind <key-id> --clear[providers.bedrock]
type = "bedrock"
aws_region = "env:AWS_REGION"
aws_access_key_id = "env:AWS_ACCESS_KEY_ID"
aws_secret_access_key = "env:AWS_SECRET_ACCESS_KEY"
# aws_session_token = "env:AWS_SESSION_TOKEN" # OptionalAutomatically intercept LLM API calls from any application (including desktop apps like ChatGPT and Claude) without changing client configuration. This feature uses mitmproxy for cross-platform traffic interception.
Prerequisites:
- mitmproxy 10.1.5+ (
brew install mitmproxyon macOS,pip install mitmproxyelsewhere) - On first run, trust mitmproxy's CA certificate (see mitmproxy docs)
Option 1: Auto-start via config (recommended)
[capture]
enabled = true
mode = "local" # Capture all local traffic
# mode = "local:ChatGPT" # Capture specific app only
# verbose = true # Enable verbose logging
# api_only = true # Skip desktop app domainsThen just run eavs serve - mitmproxy starts automatically.
Option 2: Manual mitmproxy start
# Terminal 1: Start Eaves
eavs serve
# Terminal 2: Start mitmproxy with capture addon
mitmproxy --mode local -s scripts/eavs_capture.py
# Capture specific app only
mitmproxy --mode local:ChatGPT -s scripts/eavs_capture.pyCaptured domains:
- API endpoints: api.openai.com, api.anthropic.com, generativelanguage.googleapis.com, api.mistral.ai, api.groq.com, etc.
- Desktop apps: chat.openai.com, claude.ai, gemini.google.com, perplexity.ai, poe.com, etc.
How it works:
- mitmproxy intercepts outgoing HTTPS traffic using its local capture mode
- The
eavs_capture.pyaddon detects LLM-related domains and redirects them to Eaves - Eaves auto-detects the provider from the original host and proxies the request
- All traffic is logged and can be analyzed, rate-limited, or modified
# Start server in foreground
eavs serve --host 0.0.0.0 --port 8080
# Background service
eavs service start [--port 3000]
eavs service stop
eavs service restart
eavs service status
eavs service logsAll /v1/* requests are forwarded to the configured upstream provider.
# Use default provider
curl http://localhost:3000/v1/chat/completions ...
# Use specific provider
curl http://localhost:3000/v1/chat/completions \
-H "X-Provider: anthropic" ...
# Track conversation
curl http://localhost:3000/v1/chat/completions \
-H "X-Conversation-ID: my-session" ...curl http://localhost:3000/healthcurl http://localhost:3000/providerscurl -X POST http://localhost:3000/inject/my-conversation \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "system", "content": "You are a pirate."}]}'curl -X POST http://localhost:3000/clear/my-conversationcurl http://localhost:3000/conversationscurl http://localhost:3000/conversations/statscurl http://localhost:3000/conversations/my-conversationcurl -X PATCH http://localhost:3000/conversations/my-conversation \
-H "Content-Type: application/json" \
-d '{"provider": "anthropic", "tags": ["test"]}'curl http://localhost:3000/logs/stream# Run tests
cargo test
# Quick chat test
eavs test chat "Hello" --provider openai
# Sequential benchmark (mock provider = no API costs)
eavs test bench --provider mock --count 50
# Concurrent benchmark
eavs test bench --provider mock --concurrent 10 --count 100
# Duration-based load test
eavs test bench --provider mock --concurrent 50 --duration 30s| Flag | Description |
|---|---|
--provider <name> |
Provider to test (default: "default") |
--count <n> |
Number of requests (default: 10) |
--concurrent <n> |
Parallel requests (default: 1) |
--duration <time> |
Run for duration (e.g., "30s", "1m") |
--model <model> |
Model to use (optional) |
The mock provider returns synthetic responses without network calls, ideal for measuring proxy overhead.
MIT