A minimal viable edge inference fabric that demonstrates distributed inference routing on a single device. This proof-of-concept shows how inference requests can be routed to available workers based on load and latency.
- Control Plane Server: Routes inference requests to available workers
- Device Agents: Execute inference workloads (simulate multiple with different ports)
- Client API: Simple REST interface for submitting inference requests
- Web Dashboard: Visualize routing decisions and performance metrics
- Real ML Model: Uses ONNX Runtime with MobileNetV2
pip install -r requirements.txtpython run_demo.pyThis will:
- Start the control plane on port 8000
- Start 3 device agents on ports 8001, 8002, 8003
- Download the MobileNetV2 model if needed
Open http://localhost:8000 in your browser to see:
- Active devices with CPU/memory stats
- Real-time metrics
- Test inference button
python test_fabric.py┌─────────────────┐ ┌─────────────────┐
│ Client │────▶│ Control Plane │
│ (API/Browser) │ │ (Port 8000) │
└─────────────────┘ └────────┬────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Device 1 │ │ Device 2 │ │ Device 3 │
│ (8001) │ │ (8002) │ │ (8003) │
└──────────┘ └──────────┘ └──────────┘
Control Plane (http://localhost:8000)
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Dashboard |
| GET | /api/devices |
List all devices |
| POST | /api/devices/register |
Register a device |
| POST | /api/infer |
Submit inference request |
| GET | /api/metrics |
System metrics |
| GET | /api/routing/strategy |
Get routing strategy |
| POST | /api/routing/strategy |
Set routing strategy |
Device Agent (http://localhost:800X)
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| GET | /info |
Device and model info |
| POST | /execute |
Execute inference |
edge-fabric-mvp/
├── control_plane/
│ ├── __init__.py
│ ├── server.py # FastAPI app for routing
│ ├── router.py # Routing logic
│ └── models.py # Data models
├── device_agent/
│ ├── __init__.py
│ ├── agent.py # Device agent
│ └── inference.py # ONNX inference wrapper
├── client/
│ ├── __init__.py
│ └── api.py # Client library
├── static/
│ ├── index.html # Dashboard
│ └── style.css # Styling
├── models/
│ ├── download_model.py # Model downloader
│ └── mobilenetv2-7.onnx # Model file
├── requirements.txt
├── run_demo.py # Demo runner
├── test_fabric.py # Test suite
└── README.md
- least_loaded (default): Routes to device with lowest CPU/memory usage
- round_robin: Distributes requests evenly across devices
- lowest_latency: Routes to device with best historical latency
- random: Random device selection
from client.api import EdgeInferenceClient
import numpy as np
client = EdgeInferenceClient("http://localhost:8000")
# Run inference
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
result = client.infer("mobilenet", input_data)
print(f"Latency: {result['latency_ms']}ms")
print(f"Device: {result['device_id']}")
# Get devices
devices = client.get_devices()
print(f"Active devices: {len(devices)}")
# Run benchmark
benchmark = client.benchmark(num_requests=100)
print(f"P95 latency: {benchmark['p95_latency_ms']}ms")# List devices
curl http://localhost:8000/api/devices
# Submit inference
curl -X POST http://localhost:8000/api/infer \
-H "Content-Type: application/json" \
-d '{"model_name": "mobilenet", "input_data": [0.5], "input_shape": [1, 3, 224, 224]}'
# Get metrics
curl http://localhost:8000/api/metrics# Start with 5 agents
python run_demo.py --agents 5
# Start only control plane
python run_demo.py --no-agents
# Custom ports
python run_demo.py --control-port 9000 --agent-start-port 9001# Control plane
python -m uvicorn control_plane.server:app --port 8000
# Device agent
python -m device_agent.agent --port 8001 --device-id my-deviceTypical performance on modern hardware:
- Single inference: 20-50ms (MobileNetV2)
- Routing overhead: <1ms
- Throughput: 20-50 req/s per device
- Download ONNX model to
models/ - Update
device_agent/inference.pyto support new model - Agents will automatically load models on startup
Add to control_plane/router.py:
def _my_custom_select(self, devices: list[DeviceInfo]) -> DeviceInfo:
# Your logic here
return selected_device