[Feature] support async rl #1360

YanhuiDua · 2025-12-15T06:52:49Z

This PR introduces asynchronous RL support to Xtuner, enabling partial rollouts and version-based sample management for more efficient training data generation.

1. Key Concepts:

staleness_threshold: The maximum allowed threshold of stale (expired) samples in a training batch.
enable_partial_rollout: Whether to enable partial rollout for asynchronous data generation.
tail_batch_candidate_steps: Number of rollout steps after which a sample becomes a candidate for the tail batch. Set to 0 to disable. 0 means no tail batch.
tail_batch_trigger_size: Number of candidate samples needed in the queue to trigger a tail batch operation. It will be set to global_batch_size when not provided by user or set to 0

2. Async logic:

Strategy Type	Settings	Core Features
Synchronous Strategy	`staleness_threshold=0.0` `enable_partial_rollout=0` `tail_batch_candidate_steps=0`	1. No data oversending
Asynchronous 1	`staleness_threshold=0.2` `enable_partial_rollout=0` `tail_batch_candidate_steps=0`	1. 20% data oversending 2. Responses not retained when paused rollout 3. Prioritize sampling data from the abort queue
Asynchronous 2	`staleness_threshold=0.2` `enable_partial_rollout=0` `tail_batch_candidate_steps=1` `tail_batch_trigger_size=0`	1. 20% data oversending 2. Responses not retained when paused 3. Prioritize sampling data from the abort queue 4. Put it into the candidate pool when sample abort num reaches `tail_batch_candidate_steps+1`
Asynchronous 3	`staleness_threshold=0.2` `enable_partial_rollout=1` `tail_batch_candidate_steps=0` `tail_batch_trigger_size=0`	1. 20% data oversending 2. Responses retained & concatenated when paused 3. Prioritize sampling data from the abort queue
Asynchronous 4	`staleness_threshold=0.2` `enable_partial_rollout=1` `tail_batch_candidate_steps=1` `tail_batch_trigger_size=0`	1. 20% data oversending 2. Responses retained & concatenated when paused 3. Prioritize sampling data from the abort queue 4. Put it into the candidate pool when sample abort num reaches `tail_batch_candidate_steps+1`. the `tail_batch_candidate_steps` means off policy step

3. BenchMark

4. Relative PR

PR: [Feat][1/N] support async_rl in replaybuffer #1337

Added async-related configuration parameters including partial_rollout, tail_batch_candidate_steps, tail_batch_trigger_size and staleness_threshold；
Refactored replay buffer storage to support versioned samples with bucketed tracking of completed, aborted, and expired states
Renamed Sampler to DatasetSampler and separated dataset sampling logic from replay buffer sampling

PR2: [Feat][2/N] support async_rl in dataflow YanhuiDua/xtuner#2

Apply sample_from_expired_storage in dataflow. When sample_from_expired_storage is set to True, the dataflow will not oversend data and will return data only after all tasks of the current batch are completed.
Add task time log info.

PR3: [Feat][3/N] support async_rl in rollout YanhuiDua/xtuner#3

Added partial rollout functionality with versioned response tracking to accumulate tokens across multiple generation steps
Implemented automatic worker restart mechanism when all rollout workers become inactive
Fixed state handling for aborted rollouts and improved error logging

PR4: [Feat][4/4] support async_rl in rl_trainer YanhuiDua/xtuner#4

Add tensorboard for training and rollout metrics.
Refactored the training loop in fit() to conditionally execute rollout, training, and weight synchronization based on debug mode
Fix async running bugs

…nd storage

…orage

jayhenry · 2025-12-23T07:44:28Z

xtuner/v1/ray/dataflow/flow.py

        waiting_tasks = set()
        dataflow_start_time = time.perf_counter()
        task_completion_times = []
        with tqdm(total=self.target_batch_size, desc="rollout_controller for training samples") as pbar:


使用 tqdm(miniters=10) （Minimum progress display update interval in iters）并在循环中使用 pbar.update(finished_samples) 来代替 manual pbar.fresh。最小化pbar在loop中的操作。

jayhenry · 2025-12-23T07:48:38Z

xtuner/v1/train/rl_trainer.py

-                        data_batches, pack_max_length=self._train_worker_cfg.pack_max_length, rollout_idx=rollout_idx
-                    )
-                )



Nice hierarchical code!

jayhenry · 2025-12-23T08:14:22Z

xtuner/v1/ray/dataflow/replay_buffer.py

-                collator="fake_collator",
-                pack_level="none",
+            expired_threshold = (
+                min(remain_size, self.config.tail_batch_trigger_size)


use cast(int, xxx) instead

jayhenry · 2025-12-23T08:34:12Z

xtuner/v1/ray/dataflow/flow.py

+                self.finished_samples_count = await self.replay_buffer.get_completed_samples_count.remote()
                waiting_tasks = pending_tasks

+                while len(waiting_tasks) + self.finished_samples_count < max(data_concurrency, self.target_batch_size):


len(waiting_tasks) + self.finished_samples_count < data_concurrency + init_finished_samples_count

YanhuiDua added 6 commits December 8, 2025 15:30

[Feat][1/N] support async_rl in replaybuffer by refactoring sampler a…

6cff996

…nd storage

[Feat][1/N] support async_rl in replaybuffer by supporting expired st…

4c6d2fc

…orage

[Feat][2/N] support async_rl in dataflow

0b634e4

[Feat][3/N] support async_rl in rollout

d87b4b1

[Feat][4/4] support async_rl in rl_trainer

ce74b1f

[Feat][5/5] add tensorboard metrics

7b4d41a

YanhuiDua force-pushed the support_async_rl_4 branch from efb3109 to 1601d51 Compare December 16, 2025 09:40

[Feat][6/N] fix some bugs and add logs

aaa4860

YanhuiDua force-pushed the support_async_rl_4 branch 2 times, most recently from 5e3f135 to aaa4860 Compare December 19, 2025 04:20

jayhenry reviewed Dec 23, 2025

View reviewed changes

[Fix] fix concating routed_experts in r3 with partial rollout

953a613

YanhuiDua force-pushed the support_async_rl_4 branch from 31b3535 to 953a613 Compare December 23, 2025 09:38

YanhuiDua added 2 commits December 23, 2025 20:41

[fix] add token-level entropy

f1deb88

tmp-commit: fix r3 bug

4bd4c4f

YanhuiDua force-pushed the support_async_rl_4 branch from f6fa0fd to 4bd4c4f Compare December 25, 2025 08:48

fix routed_experts in rl_trainer

ba993a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] support async rl #1360

[Feature] support async rl #1360

Uh oh!

YanhuiDua commented Dec 15, 2025 •

edited by hhaAndroid

Loading

Uh oh!

jayhenry Dec 23, 2025

Uh oh!

jayhenry Dec 23, 2025

Uh oh!

jayhenry Dec 23, 2025 •

edited

Loading

Uh oh!

jayhenry Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] support async rl #1360

Are you sure you want to change the base?

[Feature] support async rl #1360

Uh oh!

Conversation

YanhuiDua commented Dec 15, 2025 • edited by hhaAndroid Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayhenry Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

jayhenry Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

jayhenry Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayhenry Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YanhuiDua commented Dec 15, 2025 •

edited by hhaAndroid

Loading

jayhenry Dec 23, 2025 •

edited

Loading