A middleware server that intercepts and modifies sampling parameters for generation requests to OpenAI-compatible backends. It allows overriding specific parameters per model name when they are not set in the request, or enforcing parameter overrides when they are set in the request. The server supports both OpenAI-compatible and Anthropic request formats, enabling the use of Claude Code with OpenAI-compatible backends.
- Parameter Override: Automatically applies custom sampling parameters to generation requests
- Model-Specific Settings: Configure different parameters for different models
- Format Conversion: Converts between Anthropic and OpenAI request/response formats
- Streaming Support: Handles both streaming and non-streaming responses
- Enforced Parameters: Option to enforce specific parameters that override all others
- Debug Logging: Comprehensive logging for troubleshooting
- Python 3.8 or higher
- pip (Python package manager)
-
Clone or download the project:
git clone https://github.com/avtc/sampling-proxy.git cd sampling-proxy -
Create a virtual environment:
python -m venv sampling-proxy
-
Activate the virtual environment:
On Windows:
sampling-proxy\Scripts\activate
On macOS/Linux:
source sampling-proxy/bin/activate -
Make the shell script executable:
chmod +x ./sampling_proxy.sh
-
Create configuration file:
cp config_sample.json config.json
Then edit
config.jsonto match your specific configuration needs. -
Install the dependencies:
pip install -r requirements.txt
To update your existing installation to the latest version from the git repository:
-
Navigate to the project directory:
cd sampling-proxy -
Activate the virtual environment:
On Windows:
sampling-proxy\Scripts\activate
On macOS/Linux:
source sampling-proxy/bin/activate -
Pull the latest changes:
git pull origin main
-
Update dependencies (if requirements.txt has changed):
pip install -r requirements.txt --upgrade
-
Restart the proxy server if it's currently running.
Run the proxy server with default settings:
python sampling_proxy.pyThis will start the proxy server on http://0.0.0.0:8001 and forward requests to an OpenAI-compatible backend at http://127.0.0.1:8000/v1.
python sampling_proxy.py --helpAvailable options:
--config, -c: Path to configuration JSON file (default: config.json)--host: Host address for the proxy server (overrides config)--port: Port for the proxy server (overrides config)--base-path: Base path for the proxy server (overrides config)--target-base-url: OpenAI compatible backend base url (overrides config)--debug-logs, -d: Enable detailed debug logging (overrides config)--override-logs, -o: Show when sampling parameters are overridden (overrides config)--enforce-params, -e: Enforce specific parameters as JSON string (overrides config)
-
Run with custom target base url and debug logging:
python sampling_proxy.py --target-base-url http://127.0.0.1:8000/v1 --debug-logs
-
Run with a custom configuration file:
python sampling_proxy.py --config my-config.json
-
Run with enforced parameters:
python sampling_proxy.py --enforce-params '{"temperature": 0.7, "top_p": 0.9}' -
Run with override logs to see parameter changes:
python sampling_proxy.py --override-logs
The proxy uses an external config.json file for configuration. A sample configuration file config_sample.json is provided - copy it to config.json and modify as needed. You can specify a custom config file path with the --config command-line argument.
The proxy applies sampling parameters in the following priority order (from highest to lowest):
- Enforced sampling parameters (always override everything)
- Parameters specified in the request
- Model-specific sampling parameters
- Default sampling parameters (fallback values)
The proxy handles the following endpoints:
/generate- SGLang generation endpoint/completions- OpenAI completions/chat/completions- OpenAI chat completions/messages- Anthropic messages (converted to OpenAI format)
/models- List available models- All other endpoints are passed through to the backend
/- Returns proxy configuration and status
import openai
client = openai.OpenAI(
base_url="http://localhost:8001", # Point to the proxy
api_key="not-required"
)
response = client.chat.completions.create(
model="your-model",
messages=[{"role": "user", "content": "Hello!"}]
)from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8001", # Point to the proxy
api_key="not-required"
)
response = client.messages.create(
model="your-model",
max_tokens=100,
messages=[{"role": "user", "content": "Hello!"}]
)python sampling_proxy.py --debug-logs --override-logs- Connection Refused: Ensure your backend server is running and accessible
- 404 Errors: Check if the backend supports the requested endpoints
- Parameter Not Applied: Use
--override-logsto see when parameters are being overridden
The proxy provides detailed logging including:
- Incoming requests
- Parameter overrides
- Backend communication
- Error details
This project is licensed under the MIT License. See the LICENSE file for details.
For convenience, use the provided scripts to start the proxy with the correct virtual environment:
./sampling_proxy.sh.\sampling_proxy.ps1Both scripts will automatically activate the sampling_proxy virtual environment and start the proxy server.