Mixture-of-Experts VQA, streaming-ready, and MCP-native.
ViperMCP is a mixture-of-experts (MoE) visual question‑answering (VQA) server that exposes streamable MCP tools for:
- 🔎 Visual grounding
- 🧩 Compositional image QA
- 🌐 External knowledge‑dependent image QA
It’s built on the shoulders of 🐍 ViperGPT and delivered as a FastMCP HTTP server, so it works with all FastMCP client tooling.
- ⚡ MCP-native JSON‑RPC 2.0 endpoint (
/mcp/) with streaming - 🧠 MoE routing across classic and modern VLMs/LLMs
- 🧰 Two tools out of the box:
viper_query(text) &viper_task(crops/masks) - 🐳 One‑command Docker or pure‑Python install
- 🔐 Secure key handling via env var or secret mount
An OpenAI API key is required. Provide it via one of the following:
OPENAI_API_KEY(environment variable)OPENAI_API_KEY_PATH(path to a file containing the key)?apiKey=...HTTP query parameter (for quick local testing)
Use ngrok to expose your local server:
pip install ngrok
ngrok http 8000Use the ngrok URL anywhere you see http://0.0.0.0:8000 below.
- Save your key to
api.key, then run:
docker run -i --rm \
--mount type=bind,source=/path/to/api.key,target=/run/secrets/openai_api.key,readonly \
-e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \
-p 8000:8000 \
rsherby/vipermcp:latestThis starts a CUDA‑enabled container serving MCP at:
http://0.0.0.0:8000/mcp/
💡 Prefer building from source? Use the included
docker-compose.yaml. By default it readsapi.keyfrom the project root. If your platform injects env vars, you can also setOPENAI_API_KEYdirectly.
git clone --recurse-submodules https://github.com/ryansherby/ViperMCP.git
cd ViperMCP
bash download-models.sh
# Store your key for local dev
echo YOUR_OPENAI_API_KEY > api.key
# (recommended) activate a virtualenv / conda env
pip install -r requirements.txt
pip install -e .
# run the server
python run_server.pyYour server should be live at:
http://0.0.0.0:8000/mcp/
To use OpenAI‑backed models via query param:
http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXXXXXXXXXXXXXXXXXX
Pass images as base64 (shown) or as URLs:
image_path='./your_image.png'
img_byte_arr = io.BytesIO()
image.save(img_byte_arr, format='PNG')
img_byte_arr.seek(0)
image_bytes = img_byte_arr.read()
img_b64_string = base64.b64encode(image_bytes).decode('utf-8')
async with client:
await client.ping()
tools = await client.list_tools() # optional
query = await client.call_tool(
"viper_query",
{"query": "how many muffins can each kid have for it to be fair?"},
{"image": f"data:image/png;base64,{img_b64_string}"},
)
task = await client.call_tool(
"viper_task",
{"task": "return a mask of all the people in the image"},
{"image": f"data:image/png;base64,{img_b64_string}"},
)The OpenAI MCP integration currently accepts image URLs (not raw base64). Send the URL as type: "input_text".
response = client.responses.create(
model="gpt-4o",
tools=[
{
"type": "mcp",
"server_label": "ViperMCP",
"server_url": f"{server_url}/mcp/",
"require_approval": "never",
},
],
input=[
{"role": "system", "content": "Forward any queries or tasks relating to an image directly to the ViperMCP server."},
{
"role": "user",
"content": [
{"type": "input_text", "text": "based on this image, how many muffins can each kid have for it to be fair?"},
{"type": "input_text", "text": img_url},
],
},
],
)GET /health => 'OK' (200)
GET /device => {"device": "cuda"|"mps"|"cpu"}
GET /mcp?apiKey= => 'Query parameters set successfully.'
POST /mcp/
viper_query(query, image) -> str
# Returns a text answer to your query.
viper_task(task, image) -> list[Image]
# Returns a list of images (e.g., masks) satisfying the task.
- 🐊 Grounding DINO
- ✂️ Segment Anything (SAM)
- 🤖 GPT‑4o‑mini (LLM)
- 👀 GPT‑4o‑mini (VLM)
- 🧠 GPT‑4.1
- 🔭 X‑VLM
- 🌊 MiDaS (depth)
- 🐝 BERT
🧭 The MoE router picks from these based on the tool & prompt.
This package may generate and execute code on the host. We include basic injection guards, but you must harden for production. A recommended architecture separates concerns:
MCP Server (Query + Image)
=> Client Server (Generate Code Request)
=> Backend Server (Generates Code)
=> Client Server (Executes Wrapper Functions)
=> Backend Server (Executes Underlying Functions)
=> Client Server (Return Result)
=> MCP Server (Respond)
- 🧱 Isolate codegen & execution.
- 🔒 Lock down secrets & file access.
- 🧪 Add unit/integration tests around wrappers.
Huge thanks to the ViperGPT team:
@article{surismenon2023vipergpt,
title={ViperGPT: Visual Inference via Python Execution for Reasoning},
author={D'idac Sur'is and Sachit Menon and Carl Vondrick},
journal={arXiv preprint arXiv:2303.08128},
year={2023}
}
PRs welcome! Please:
- ✅ Ensure all tests in
/testspass - 🧪 Add coverage for new features
- 📦 Keep docs & examples up to date
# Run with Docker (mount key file)
docker run -i --rm \
--mount type=bind,source=$(pwd)/api.key,target=/run/secrets/openai_api.key,readonly \
-e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \
-p 8000:8000 rsherby/vipermcp:latest
# From source (after setup)
python run_server.py
# Hit health
curl http://0.0.0.0:8000/health
# List device
curl http://0.0.0.0:8000/device
# Use query param key (local only)
curl "http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXX..."Open an issue or start a discussion. We ❤️ feedback and ambitious ideas!