Skip to content

Conversation

@YanhuiDua
Copy link
Collaborator

@YanhuiDua YanhuiDua commented Dec 5, 2025

This PR introduces asynchronous RL support to the replay buffer system, enabling partial rollouts and version-based sample management for more efficient training data generation. This is the first part of a multi-part feature implementation.

Key changes:

  1. Added async-related configuration parameters including partial_rollout, tail_batch_candidate_steps, tail_batch_trigger_size and staleness_threshold
  • staleness_threshold: The maximum allowed threshold of stale (expired) samples in a training batch. Must be between 0.0 and 1.0.
  • enable_partial_rollout: Whether to enable partial rollout for asynchronous data generation.
  • tail_batch_candidate_steps: Number of rollout steps after which a sample becomes a candidate for the tail batch. Set to 0 to disable. 0 means no tail batch.
  • tail_batch_trigger_size: Number of candidate samples needed in the queue to trigger a tail batch operation. It will be set to global_batch_size when not provided by user
  1. Refactored replay buffer storage to support versioned samples with bucketed tracking of completed, aborted, and expired states

  2. Renamed Sampler to DatasetSampler and separated dataset sampling logic from replay buffer sampling

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces asynchronous RL support to the replay buffer system, enabling partial rollouts and version-based sample management for more efficient training data generation. This is the first part of a multi-part feature implementation.

Key changes:

  • Refactored replay buffer storage to support versioned samples with bucketed tracking of completed, aborted, and expired states
  • Renamed Sampler to DatasetSampler and separated dataset sampling logic from replay buffer sampling
  • Added async-related configuration parameters including partial_rollout, tail_batch_candidate_steps, and staleness_threshold

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
xtuner/v1/ray/dataflow/replay_buffer.py Major refactoring: added version tracking to ReplayMeta, introduced bucketed storage for different sample states, renamed and split Sampler class, updated storage management methods
xtuner/v1/ray/dataflow/flow.py Added async-related config parameters, updated DataFlow initialization to pass async configs to replay buffer, renamed _reset_internal_states to _prepare with prerun state fetching

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@YanhuiDua YanhuiDua force-pushed the support_async_rl branch 5 times, most recently from 2191802 to 87c9c14 Compare December 8, 2025 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant