Fix: Training stability and configuration improvements #62

yucc-leon · 2026-01-15T10:05:35Z

Summary

This PR includes several bug fixes and improvements to enhance training stability and usability.

Changes

Fix: Pin verifiers version and update skyrl-train dependency

Pin verifiers==0.1.6.post0 to avoid SDK interface incompatibilities
Update skyrl-train to commit 7cd23ea to fix a crash when saving the final checkpoint

Fix: Activate max_input_tokens to prevent context overflow

Ensure the existing max_input_tokens configuration is properly applied
Prevent context overflow issues that could cause vLLM to crash and terminate training

Fix: Tool call metrics counting using ActionEvent

Correct tool call counting logic to use ActionEvent instead of ToolCallEvent
Improve metrics accuracy for tool usage tracking

Fix: LiteLLM model routing with correct provider prefix

Ensure proper OpenAI-compatible routing with the openai/ prefix
Support both HuggingFace model IDs and local model paths

Fix: HTTP endpoint connection when host is 0.0.0.0

Use 127.0.0.1 for client connections when the server binds to 0.0.0.0
Prevent connection errors in distributed training setups

Testing

All changes have been tested with:

Local model paths
HuggingFace model IDs
Distributed training configurations

Impact

Improved training stability
Reduced risk of runtime crashes
Clearer documentation for users

When http_endpoint_host is set to 0.0.0.0 for server binding, clients cannot connect to 0.0.0.0. This fix automatically uses 127.0.0.1 for client connections when the host is 0.0.0.0 or ::. This resolves connection failures in distributed training setups where the server binds to all interfaces but clients need a specific loopback address to connect. Technical details: - Server binding: 0.0.0.0 (listens on all interfaces) - Client connection: 127.0.0.1 (connects to loopback) - Also handles IPv6 :: binding

Change from litellm_proxy/ to openai/ prefix to properly route requests to SkyRL's OpenAI-compatible endpoint. The litellm_proxy/ prefix was causing 'LLM Provider NOT provided' errors. LiteLLM requires a provider prefix to route requests correctly. Using the openai/ prefix ensures: 1. Proper routing to OpenAI-compatible endpoints 2. Preservation of SkyRL's strict model name checking 3. Correct model identifier extraction after the first slash Example: model_name = "/path/to/qwen3-4b" litellm_model_name = "openai//path/to/qwen3-4b" SkyRL receives: model="/path/to/qwen3-4b" (matches loaded model) This fixes LLM initialization failures in training.

Change tool call counting from assistant messages to ActionEvent messages, which represent actual tool invocations. The previous method of counting tool_calls field in assistant messages was inaccurate and didn't reflect actual tool executions. Changes: - Count ActionEvent messages instead of assistant messages - Extract tool_name directly from ActionEvent.tool_name field - Simplify logic by removing complex dict/object handling - Update tests to use ActionEvent-based test data This provides accurate tool usage statistics for training analysis and metrics reporting. Technical details: - ActionEvent represents actual tool invocations in the system - Assistant messages may contain tool_calls but don't guarantee execution - This fix ensures metrics match actual tool usage behavior

Add max_input_tokens configuration to LLM initialization to prevent context length overflow errors during training. Without this setting, the system may crash when input exceeds model's context window. Changes: - Read max_input_length from generator config (default: 38400) - Calculate effective_max_input by reserving 2000 tokens for overhead - Pass max_input_tokens to LLM initialization - Let OpenHands handle context truncation automatically This prevents training crashes due to context overflow and ensures stable long-running training sessions. Technical details: - Reserves tokens for system prompt, tools, and response generation - OpenHands will automatically truncate context when needed - Default 38400 tokens accommodates most training scenarios

Pin verifiers to exact version 0.1.6.post0 for SDK/API compatibility and reproducibility. Update skyrl-train to rev 7cd23ea for latest fixes. Changes: - Pin verifiers: 0.1.6.post0 (was >=0.1.6.post0) - Update skyrl-train: rev 7cd23ea (was 69ca4d9) - Add note in README_verifiers.md about version pinning This ensures: - Reproducible builds across environments - SDK/API compatibility - Latest skyrl-train fixes and improvements Note: If you hit version mismatch issues, re-run 'uv sync' and avoid installing verifiers via pip outside the uv environment.

- Update code_search_generator.py comment to clarify LiteLLM routing supports both HuggingFace model IDs and local paths - Add model path format documentation to README_Training.md - Simplify TESTBED_ROOT environment variable handling

yucc-leon · 2026-01-15T12:11:01Z

Sorry. For the fourth fix, I initially thought the OpenAIExceptions were caused by the input model prefix, but it now seems that Aditya/Lintang had already identified the problem and submitted a PR: NovaSky-AI/SkyRL#796

…wrapper reasoning parser

yucc-leon added 7 commits January 15, 2026 07:06

Doc: Clarify model path format support in training

b271985

- Update code_search_generator.py comment to clarify LiteLLM routing supports both HuggingFace model IDs and local paths - Add model path format documentation to README_Training.md - Simplify TESTBED_ROOT environment variable handling

update example scripts

6ec1148

Update skyrl-train dependency to apply Aditya’s fix for skyrl’s vLLM …

31cd993

…wrapper reasoning parser

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Training stability and configuration improvements #62

Fix: Training stability and configuration improvements #62

Uh oh!

yucc-leon commented Jan 15, 2026

Uh oh!

yucc-leon commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Training stability and configuration improvements #62

Are you sure you want to change the base?

Fix: Training stability and configuration improvements #62

Uh oh!

Conversation

yucc-leon commented Jan 15, 2026

Summary

Changes

Testing

Impact

Uh oh!

yucc-leon commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yucc-leon commented Jan 15, 2026 •

edited

Loading