Edge Inference Fabric MVP

A minimal viable edge inference fabric that demonstrates distributed inference routing on a single device. This proof-of-concept shows how inference requests can be routed to available workers based on load and latency.

Features

Control Plane Server: Routes inference requests to available workers
Device Agents: Execute inference workloads (simulate multiple with different ports)
Client API: Simple REST interface for submitting inference requests
Web Dashboard: Visualize routing decisions and performance metrics
Real ML Model: Uses ONNX Runtime with MobileNetV2

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Run the Demo

python run_demo.py

This will:

Start the control plane on port 8000
Start 3 device agents on ports 8001, 8002, 8003
Download the MobileNetV2 model if needed

3. Access the Dashboard

Open http://localhost:8000 in your browser to see:

Active devices with CPU/memory stats
Real-time metrics
Test inference button

4. Run Tests

python test_fabric.py

Architecture

┌─────────────────┐     ┌─────────────────┐
│     Client      │────▶│  Control Plane  │
│  (API/Browser)  │     │   (Port 8000)   │
└─────────────────┘     └────────┬────────┘
                                 │
                    ┌────────────┼────────────┐
                    │            │            │
                    ▼            ▼            ▼
             ┌──────────┐ ┌──────────┐ ┌──────────┐
             │ Device 1 │ │ Device 2 │ │ Device 3 │
             │ (8001)   │ │ (8002)   │ │ (8003)   │
             └──────────┘ └──────────┘ └──────────┘

API Endpoints

Control Plane (http://localhost:8000)

Method	Endpoint	Description
GET	`/`	Dashboard
GET	`/api/devices`	List all devices
POST	`/api/devices/register`	Register a device
POST	`/api/infer`	Submit inference request
GET	`/api/metrics`	System metrics
GET	`/api/routing/strategy`	Get routing strategy
POST	`/api/routing/strategy`	Set routing strategy

Device Agent (http://localhost:800X)

Method	Endpoint	Description
GET	`/health`	Health check
GET	`/info`	Device and model info
POST	`/execute`	Execute inference

Project Structure

edge-fabric-mvp/
├── control_plane/
│   ├── __init__.py
│   ├── server.py          # FastAPI app for routing
│   ├── router.py          # Routing logic
│   └── models.py          # Data models
├── device_agent/
│   ├── __init__.py
│   ├── agent.py           # Device agent
│   └── inference.py       # ONNX inference wrapper
├── client/
│   ├── __init__.py
│   └── api.py             # Client library
├── static/
│   ├── index.html         # Dashboard
│   └── style.css          # Styling
├── models/
│   ├── download_model.py  # Model downloader
│   └── mobilenetv2-7.onnx # Model file
├── requirements.txt
├── run_demo.py            # Demo runner
├── test_fabric.py         # Test suite
└── README.md

Routing Strategies

least_loaded (default): Routes to device with lowest CPU/memory usage
round_robin: Distributes requests evenly across devices
lowest_latency: Routes to device with best historical latency
random: Random device selection

Usage Examples

Python Client

from client.api import EdgeInferenceClient
import numpy as np

client = EdgeInferenceClient("http://localhost:8000")

# Run inference
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
result = client.infer("mobilenet", input_data)
print(f"Latency: {result['latency_ms']}ms")
print(f"Device: {result['device_id']}")

# Get devices
devices = client.get_devices()
print(f"Active devices: {len(devices)}")

# Run benchmark
benchmark = client.benchmark(num_requests=100)
print(f"P95 latency: {benchmark['p95_latency_ms']}ms")

cURL

# List devices
curl http://localhost:8000/api/devices

# Submit inference
curl -X POST http://localhost:8000/api/infer \
  -H "Content-Type: application/json" \
  -d '{"model_name": "mobilenet", "input_data": [0.5], "input_shape": [1, 3, 224, 224]}'

# Get metrics
curl http://localhost:8000/api/metrics

Configuration

Running with Custom Settings

# Start with 5 agents
python run_demo.py --agents 5

# Start only control plane
python run_demo.py --no-agents

# Custom ports
python run_demo.py --control-port 9000 --agent-start-port 9001

Starting Components Individually

# Control plane
python -m uvicorn control_plane.server:app --port 8000

# Device agent
python -m device_agent.agent --port 8001 --device-id my-device

Performance

Typical performance on modern hardware:

Single inference: 20-50ms (MobileNetV2)
Routing overhead: <1ms
Throughput: 20-50 req/s per device

Extending

Adding a New Model

Download ONNX model to models/
Update device_agent/inference.py to support new model
Agents will automatically load models on startup

Custom Routing Strategy

Add to control_plane/router.py:

def _my_custom_select(self, devices: list[DeviceInfo]) -> DeviceInfo:
    # Your logic here
    return selected_device

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edge Inference Fabric MVP

Features

Quick Start

1. Install Dependencies

2. Run the Demo

3. Access the Dashboard

4. Run Tests

Architecture

API Endpoints

Control Plane (http://localhost:8000)

Device Agent (http://localhost:800X)

Project Structure

Routing Strategies

Usage Examples

Python Client

cURL

Configuration

Running with Custom Settings

Starting Components Individually

Performance

Extending

Adding a New Model

Custom Routing Strategy

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
client		client
control_plane		control_plane
device_agent		device_agent
models		models
static		static
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_demo.py		run_demo.py
test_fabric.py		test_fabric.py

ankthba/EdgeFabric

Folders and files

Latest commit

History

Repository files navigation

Edge Inference Fabric MVP

Features

Quick Start

1. Install Dependencies

2. Run the Demo

3. Access the Dashboard

4. Run Tests

Architecture

API Endpoints

Control Plane (http://localhost:8000)

Device Agent (http://localhost:800X)

Project Structure

Routing Strategies

Usage Examples

Python Client

cURL

Configuration

Running with Custom Settings

Starting Components Individually

Performance

Extending

Adding a New Model

Custom Routing Strategy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages