Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
47e5460
added mtoken
LucasArmandVast Nov 4, 2025
f5134d4
Fix spelling mistake
LucasArmandVast Nov 5, 2025
106067d
bump version to 0.1.1
LucasArmandVast Nov 5, 2025
8ae7b74
bump version to 0.2.0
LucasArmandVast Nov 5, 2025
b7fe4eb
Obfuscate mtoken in logs
LucasArmandVast Nov 7, 2025
c6521cb
add ...
LucasArmandVast Nov 7, 2025
d63a060
Merge pull request #56 from vast-ai/obfuscate-mtoken
LucasArmandVast Nov 10, 2025
7db54f3
Merge pull request #55 from vast-ai/use-mtoken
LucasArmandVast Nov 10, 2025
b55bfa9
Updated clients, include vastai-sdk, handle non-UTF-8
LucasArmandVast Nov 12, 2025
3adec18
minor changes
LucasArmandVast Nov 12, 2025
eedf81c
Updated readme and .gitignore
LucasArmandVast Nov 12, 2025
a12523b
Added bad code to tgi server to test
LucasArmandVast Nov 12, 2025
c510801
fix
LucasArmandVast Nov 12, 2025
de9b50a
use set +e
LucasArmandVast Nov 12, 2025
0b14562
dont exit on pyworker fail
LucasArmandVast Nov 12, 2025
a47c9d1
remove test bugs
LucasArmandVast Nov 12, 2025
d3727d4
Merge pull request #58 from vast-ai/update-client-scripts
LucasArmandVast Nov 12, 2025
2b26e5e
hotfix: remove g
LucasArmandVast Nov 13, 2025
a4339bd
hotfix: add f
LucasArmandVast Nov 13, 2025
e0449cb
add llama log
LucasArmandVast Nov 21, 2025
7a792fd
Merge pull request #64 from vast-ai/add-llama-log
LucasArmandVast Nov 21, 2025
45e0c7d
Move model log rotate to top
LucasArmandVast Nov 24, 2025
9c6ab78
Move model log line
LucasArmandVast Nov 24, 2025
7986e51
early errors
LucasArmandVast Nov 24, 2025
e143162
bumpy pyworker version
LucasArmandVast Nov 26, 2025
0339b47
Merge pull request #66 from vast-ai/synthesis
LucasArmandVast Nov 26, 2025
0bcd221
Increase model wait time for vLLM
LucasArmandVast Dec 3, 2025
2f543c0
Merge pull request #68 from vast-ai/fix-vllm-concurrency
LucasArmandVast Dec 3, 2025
adedb8b
defaults to ENDPOINT_NAME and DEFAULT_MODEL but uses the flag first i…
Colter-Downing Dec 4, 2025
8be92c0
Merge pull request #69 from vast-ai/AUTO-874--fix-openai-worker-client
Colter-Downing Dec 4, 2025
6b5b134
update tgi client
Colter-Downing Dec 4, 2025
de3aa87
Merge pull request #70 from vast-ai/AUTO-tgi-client-edits
Colter-Downing Dec 4, 2025
f04138e
update to be able to get images
Colter-Downing Dec 4, 2025
e839cfc
include view in API wrapper
Colter-Downing Dec 4, 2025
d4d36bf
done with comfy updates
Colter-Downing Dec 4, 2025
40aed9b
adding s3 as an option
Colter-Downing Dec 4, 2025
222ac2a
default endpoint name
Colter-Downing Dec 4, 2025
138fc3a
Merge pull request #71 from vast-ai/AUTO-comfyui-updates
Colter-Downing Dec 4, 2025
7be8aa6
pin pycares
LucasArmandVast Dec 11, 2025
70f8a8f
Merge pull request #72 from vast-ai/hotfix-pin-pycares
LucasArmandVast Dec 11, 2025
df61e6e
correct version pin for aiohttp (#73)
edgaratvast Dec 11, 2025
4ecc070
Mark pyworkers as "Error" if startup script fails. to avoid silent fa…
abiola-vastai Dec 11, 2025
2ce741a
Merge pull request #74 from vast-ai/AUTO-912
abiola-vastai Dec 12, 2025
4380d98
Use PyWorker SDK (#67)
LucasArmandVast Dec 16, 2025
29f836e
Backwards compatible vLLM payload (#75)
LucasArmandVast Dec 16, 2025
9daf171
Increase queue limits for vLLM and TGI
LucasArmandVast Dec 17, 2025
bcb04b9
add missing comma
LucasArmandVast Dec 17, 2025
e02f4bc
Lowered concurrency of vLLM and TGI benchmarks
LucasArmandVast Dec 17, 2025
bd3e003
Add SDK version checking (#76)
LucasArmandVast Dec 18, 2025
4d786b4
SDK Versioning Improvements (#77)
LucasArmandVast Jan 2, 2026
f319db6
flag for model log rotate (#78)
LucasArmandVast Jan 13, 2026
aaca1c9
Updated requirements to only require vastai-sdk
LucasArmandVast Jan 14, 2026
eba9c48
Merge pull request #79 from vast-ai/update-requirements
skiddar Jan 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
.envrc
__pycache__
bin/
lib64
lib64
.venv
191 changes: 127 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,152 @@
# Vast PyWorker
# Vast PyWorker Examples

Vast PyWorker is a Python web server designed to run alongside a LLM or image generation models running on vast,
enabling autoscaler integration.
It serves as the primary entry point for API requests, forwarding them to the model's API hosted on the
same instance. Additionally, it monitors performance metrics and estimates current workload based on factors
such as the number of tokens processed for LLMs or image resolution and steps for image generation models,
reporting these metrics to the autoscaler.
This repository contains **example PyWorkers** used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:

## Project Structure
- Exposes one or more HTTP routes (e.g., `/v1/completions`, `/generate/sync`)
- Optionally validates/transforms request payloads
- Computes per-request **workload** for autoscaling
- Forwards requests to the local model server
- Optionally supports FIFO queueing when the backend cannot process concurrent requests
- Detects readiness/failure from model logs and runs a benchmark to estimate throughput

* `lib/`: Contains the core PyWorker framework code (server logic, data types, metrics).
* `workers/`: Contains specific implementations (PyWorkers) for different model servers. Each subdirectory represents a worker for a particular model type.
> Important: The **core PyWorker framework** (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the **`vastai` / `vastai-sdk`** Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on *worker implementations and examples*, not the framework internals.

## Getting Started
## Repository Purpose

1. **Install Dependencies:**
```bash
pip install -r requirements.txt
```
You may also need `pyright` for type checking:
```bash
sudo npm install -g pyright
# or use your preferred method to install pyright
```
Use this repository as:

2. **Configure Environment:** Set any necessary environment variables (e.g., `MODEL_LOG` path, API keys if needed by your worker).
- A reference for how Vast templates wire up `worker.py`
- A starting point for implementing your own custom Serverless PyWorker
- A collection of working examples for common model backends

3. **Run the Server:** Use the provided script. You'll need to specify which worker to run.
```bash
# Example for hello_world worker (assuming MODEL_LOG is set)
./start_server.sh workers.hello_world.server
```
Replace `workers.hello_world.server` with the path to the `server.py` module of the worker you want to run.
If you are looking for the framework code itself, refer to the Vast.ai SDK.

## How to Use
## Project Structure

### Using Existing Workers
Typical layout:

- `workers/`
- Example worker implementations (each worker is usually a self-contained folder)
- Each example typically includes:
- `worker.py` (the entrypoint used by Serverless)
- Optional sample workflows / payloads (for ComfyUI-based workers)
- Optional local test harness scripts

## How Serverless launches worker.py

On each worker instance, the template’s startup script typically:

1. Clones your repository from `PYWORKER_REPO`
2. Installs dependencies from `requirements.txt`
3. Starts the **model server** (vLLM, TGI, ComfyUI, etc.)
4. Runs:
```bash
python worker.py
```

Your `worker.py` builds a `WorkerConfig`, constructs a `Worker`, and starts the PyWorker HTTP server.

## worker.py

A PyWorker is usually a single `worker.py` that uses SDK configuration objects:

```python
from vastai import (
Worker,
WorkerConfig,
HandlerConfig,
BenchmarkConfig,
LogActionConfig,
)

worker_config = WorkerConfig(
model_server_url="http://127.0.0.1",
model_server_port=18000,
model_log_file="/var/log/model/server.log",
handlers=[
HandlerConfig(
route="/v1/completions",
allow_parallel_requests=True,
max_queue_time=60.0,
workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
benchmark_config=BenchmarkConfig(
generator=lambda: {"prompt": "hello", "max_tokens": 128},
runs=8,
concurrency=10,
),
)
],
log_action_config=LogActionConfig(
on_load=["Application startup complete."],
on_error=["Traceback (most recent call last):", "RuntimeError:"],
on_info=['"message":"Download'],
),
)

Worker(worker_config).run()
```

If you are using a Vast.ai template that includes PyWorker integration (marked as autoscaler compatible), it should work out of the box. The template will typically start the appropriate PyWorker server automatically. Here's a few:
## Included Examples

* **TGI (Text Generation Inference):** [Vast.ai Template](https://cloud.vast.ai?ref_id=140778&template_id=72d8dcb41ea3a58e06c741e2c725bc00)
* **ComfyUI:** [Vast.ai Template](https://cloud.vast.ai?ref_id=140778&template_id=ad72c8bf7cf695c3c9ddf0eaf6da0447)
This repository contains example PyWorkers corresponding to common Vast templates, including:

Currently available workers:
* `hello_world`: A simple example worker for a basic LLM server.
* `comfyui`: A worker for the ComfyUI image generation backend.
* `tgi`: A worker for the Text Generation Inference backend.
- **vLLM**: OpenAI-compatible completions/chat endpoints with parallel request support
- **TGI (Text Generation Inference)**: OpenAI-compatible endpoints and log-based readiness
- **ComfyUI (Image / JSON workflows)**: `/generate/sync` for ComfyUI workflow execution
- **ComfyUI Wan 2.2 (T2V)**: ComfyUI workflow execution producing video outputs
- **ComfyUI ACE Step (Text-to-Music)**: ComfyUI workflow execution producing audio outputs

### Implementing a New Worker
Exact worker paths and naming may vary by template; use the `workers/` directory as the source of truth.

To integrate PyWorker with a model server not already supported, you need to create a new worker implementation under the `workers/` directory. Follow these general steps:
## Getting Started (Local)

1. **Create Worker Directory:** Add a new directory under `workers/` (e.g., `workers/my_model/`).
2. **Define Data Types (`data_types.py`):**
* Create a class inheriting from `lib.data_types.ApiPayload`.
* Implement methods like `for_test`, `generate_payload_json`, `count_workload`, and `from_json_msg` to handle request data, testing, and workload calculation specific to your model's API.
3. **Implement Endpoint Handlers (`server.py`):**
* For each model API endpoint you want PyWorker to proxy, create a class inheriting from `lib.data_types.EndpointHandler`.
* Implement methods like `endpoint`, `payload_cls`, `generate_payload_json`, `make_benchmark_payload` (for one handler), and `generate_client_response`.
* Instantiate `lib.backend.Backend` with your model server details, log file path, benchmark handler, and log actions.
* Define `aiohttp` routes, mapping paths to your handlers using `backend.create_handler()`.
* Use `lib.server.start_server` to run the application.
4. **Add `__init__.py`:** Create an empty `__init__.py` file in your worker directory.
5. **(Optional) Add Load Testing (`test_load.py`):** Create a script using `lib.test_harness.run` to test your worker against a Vast.ai endpoint group.
6. **(Optional) Add Client Example (`client.py`):** Provide a script demonstrating how to call your worker's endpoints.
1. Install Python dependencies for the examples you plan to run:
```bash
pip install -r requirements.txt
```

**For a detailed walkthrough, refer to the `hello_world` example:** [workers/hello_world/README.md](workers/hello_world/README.md)
2. Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
- You know the model server URL/port
- You have a log file path you can tail for readiness/error detection

3. Run the worker:
```bash
python worker.py
```
or, if running an example from a subfolder:
```bash
python workers/<example>/worker.py
```

**Type Hinting:** It is strongly recommended to use strict type hinting throughout your implementation. Use `pyright` to check for type errors.
> Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust `model_server_port` and `model_log_file` for local usage.

## Testing Your Worker
## Deploying on Vast Serverless

If you implement a `test_load.py` script for your worker, you can use it to load test a Vast.ai endpoint group running your instance image.
To use a custom PyWorker with Serverless:

```bash
# Example for hello_world worker
python3 -m workers.hello_world.test_load -n 1000 -rps 0.5 -k "$API_KEY" -e "$ENDPOINT_GROUP_NAME"
```
1. Create a public Git repository containing:
- `worker.py`
- `requirements.txt`

Replace `workers.hello_world.test_load` with the path to your worker's test script and provide your Vast.ai API Key (`-k`) and the target Endpoint Group Name (`-e`). Adjust the number of requests (`-n`) and requests per second (`-rps`) as needed.
2. In your Serverless template / endpoint configuration, set:
- `PYWORKER_REPO` to your Git repository URL
- (Optional) `PYWORKER_REF` to a git ref (branch, tag, or commit)

## Community & Support
3. The template startup script will clone/install and run your `worker.py`.

Join the conversation and get help:
## Guidance for Custom Workers

When implementing your own worker:

- Define one `HandlerConfig` per route you want to expose.
- Choose a workload function that correlates with compute cost:
- LLMs: prompt tokens + max output tokens (or `max_tokens` as a simpler proxy)
- Non-LLMs: constant cost per request (e.g., `100.0`) is often sufficient
- Set `allow_parallel_requests=False` for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
- Configure exactly **one** `BenchmarkConfig` across all handlers to enable capacity estimation.
- Use `LogActionConfig` to reliably detect “model loaded” and “fatal error” log lines.

## Community & Support

* **Vast.ai Discord:** [https://discord.gg/Pa9M29FFye](https://discord.gg/Pa9M29FFye)
* **Vast.ai Subreddit:** [https://reddit.com/r/vastai/](https://reddit.com/r/vastai/)
- Vast.ai Discord: https://discord.gg/Pa9M29FFye
- Vast.ai Subreddit: https://reddit.com/r/vastai/
Loading