[Refactor] refactor packing in RL train controller and train worker #1393

YanhuiDua · 2025-12-24T12:22:27Z

Motivation

Current xtuner data distribution mechanism has a pack allocation issue that leads to unstable training steps and affects training effectiveness.

The data distribution pipeline consists of three stages:

Packing Stage: Split input data_batch by token count, creating one pack per 32K tokens, resulting in N packs
Distribution Stage: Evenly distribute N packs across M workers, each worker receives N/M packs
Step Division: Divide packs per worker into steps based on optimizer_step parameter

When N/M is not divisible by optimizer_step, the actual training steps fail to match the expected value.
For example:

N/M = 44                          # packs per worker
optimizer_step = 16               # expected training steps
packs_per_step = ⌈44/16⌉ = 3      # packs allocated per step

# Actual result:
actual_steps = ⌊44/3⌋ = 14        # complete steps
# Total: 15 steps with inconsistent batch sizes

Key Changes

1. Token-aware Pre-allocation

In RawTrainingController.fit() (controller.py), samples are evenly distributed into M workers and further split into optimizer_step buckets for each worker, based on token count. This ensures balanced token distribution across all workers and steps:

batches_per_dp_group = self._balance_split_batch(data_batches, dp_size)
mini_batch_for_steps = self._balance_split_batch(dp_worker_data_batches, optimizer_steps)

2. Pack & Pad per Bucket

Within each pre-allocated bucket, data is packed and padded so that each pack does not exceed pack_max_length. Padding is applied where necessary, and the number of packs per step is aligned across all workers:

batch4pack_list = self._rearrange_batch_for_pack(step_mini_batch, pack_max_length)
step_pack = self._pad_and_pack_batches(batch4pack, pack_max_length)
self._pad_to_max_packs_across_workes(packed_data_batches, step_idx, max_packs, pack_max_length)

3. Worker-side Training

In TrainingWorker.fit() (worker.py), each worker processes its assigned data, including sequence context resolution, logprobs computation, importance sampling correction, and the actual training step:

seq_ctx = self._resolve_ray_data(data["seq_ctx"], language_cfg)
self.compute_actor_logprobs()
self._apply_rollout_is_correction()
train_step()

Copilot

Pull request overview

This PR refactors the packing logic in the RL training controller and worker components to improve token balancing and code organization. The key changes introduce a Karmarkar-Karp algorithm for balanced partitioning, extract helper methods for better code maintainability, and restructure how data batches are distributed across workers.

Key Changes

Introduces sequence-length balanced partitioning using the Karmarkar-Karp differencing algorithm to better distribute workload across devices
Refactors worker's fit method to accept nested list structure list[list[WorkerInputItem]] instead of flat list, aligning with the new per-step packing approach
Extracts reusable helper methods (_resolve_ray_data, _apply_rollout_is_correction, _create_padding_sample, _pack, _balance_split_batch) to reduce code duplication and improve maintainability

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.

File	Description
xtuner/v1/rl/utils.py	Adds Karmarkar-Karp algorithm implementation with `get_seqlen_balanced_partitions` function for balanced workload distribution across partitions
xtuner/v1/rl/base/worker.py	Refactors `fit` method to handle nested batch structure, extracts ray data resolution and importance sampling logic into separate methods, adds `get_worker_cfg` accessor method
xtuner/v1/rl/base/controller.py	Major refactoring of packing logic with new balanced splitting, padding creation, and improved data distribution across workers with per-step gradient accumulation support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xtuner/v1/rl/base/controller.py

xtuner/v1/rl/base/worker.py

xtuner/v1/rl/base/controller.py

Copilot · 2025-12-24T12:28:08Z

xtuner/v1/rl/utils.py

+# Adapted from https://github.com/volcengine/verl/blob/main/verl/utils/seqlen_balancing.py
+def karmarkar_karp(seqlen_list: list[int], k_partitions: int, equal_size: bool):
+    # see: https://en.wikipedia.org/wiki/Largest_differencing_method
+    class Set:


This class implements lt, but does not implement le or ge.

Copilot · 2025-12-24T12:28:08Z

xtuner/v1/rl/utils.py

+                return len(self.items) < len(other.items)
+            return self.items < other.items
+
+    class State:


This class implements lt, but does not implement le or ge.

jayhenry · 2025-12-25T14:46:56Z

xtuner/v1/rl/base/controller.py

    rollout_logprobs: torch.Tensor | None


 class RawTrainingController:


需要给核心函数添加单测。
在添加单测的过程中，也会为了方便单测调整函数接口，这样接口设计也会变得更合理

xtuner/v1/rl/base/controller.py

jayhenry · 2025-12-25T15:19:37Z

xtuner/v1/rl/base/controller.py

+            get_logger().info(f"default split into {dp_size} partitions with tokens: {tokens_in_partition}")
+
+        packed_data_batches: list[list[list[dict]]] = [[[] for _ in range(optimizer_steps)] for _ in range(dp_size)]
+        max_packs_per_card = [0] * optimizer_steps


rename to max_packed_batch_num_per_step

max_packs_per_step 更加准确一些：每步最大的packs数

xtuner/v1/rl/base/controller.py

jayhenry · 2025-12-25T15:35:13Z

xtuner/v1/rl/base/worker.py

+
+        # old logprobs are inplaced updated in compute_actor_logprobs
+        loss_ctx_input_list = self.compute_actor_logprobs(seq_ctx_list, loss_ctx_input_list)
+        loss_ctx_input_list, metrics = self._apply_rollout_is_correction(


Great！原来很长的fit函数变得有层次更易读了

xtuner/v1/rl/base/worker.py

xtuner/v1/rl/utils.py

jayhenry · 2025-12-26T02:43:56Z

xtuner/v1/rl/base/controller.py

+                        n_routed_experts=n_routed_experts,
+                    )
+                    padding_samples = [padding_sample for _ in range(num_padding_packs)]
+                    packed_data_batches[dp_rank][step_idx].extend(padding_samples)


可以加一个数据pack等处理的总体耗时记录。原因是：之前数据处理一部分放在TrainController单节点上，另外一部分放到Worker多节点上。现在全部放在Controller单节点上，有可能变慢，但是数据处理一般比较简单应该不会慢太多，加个监控以后方便观察来及时调整。

jayhenry · 2025-12-26T03:10:25Z

Motivation

Current xtuner data distribution mechanism has a pack allocation issue that leads to unstable training steps and affects training effectiveness.

The data distribution pipeline consists of three stages:

Packing Stage: Split input data_batch by token count, creating one pack per 32K tokens, resulting in N packs

Distribution Stage: Evenly distribute N packs across M workers, each worker receives N/M packs

Step Division: Divide packs per worker into steps based on optimizer_step parameter

When N/M is not divisible by optimizer_step, the actual training steps fail to match the expected value. For example:
N/M = 44                          # packs per worker
optimizer_step = 16               # expected training steps
packs_per_step = ⌈44/16⌉ = 3      # packs allocated per step

# Actual result:
actual_steps = ⌊44/3⌋ = 14        # complete steps
# Total: 15 steps with inconsistent batch sizes
Key Changes

This PR refactors the pipeline to: Allocate → Pack & Pad and wrappers some methods from TrainController and TrainWorker

Token-aware pre-allocation：Evenly distribute samples into M workers (optional) × optimizer_step buckets based on token count

Pack & pad per bucket: Apply packing and padding within each pre-allocated bucket

Great PR description!
Maybe you can add the calling chain of core packing functions responding to the workflow in key changes, such as

controller.py: 
RawTrainingController.fit() 
# 1. Token-aware pre-allocation：Evenly distribute samples into M workers (optional) × optimizer_step buckets based on token count
-> batches_per_dp_group = self._balance_split_batch(data_batches, dp_size)
-> mini_batch_for_steps = self._balance_split_batch(dp_worker_data_batches, optimizer_steps)
# 2. Pack & pad per bucket: Apply packing and padding within each pre-allocated bucket
-> batch4pack_list = self._rearrange_batch_for_pack(step_mini_batch, pack_max_length)   # the old version: pack_mini_batch = self._pack(step_mini_batch, pack_max_length)
-> self._pack_batches()  # pieces of packing code which is better to be wrapped in the new function `_pack_batches()`
worker.py:
-> self._create_padding_sample()  # pieces of padding code which is better to be wrapped in a new function `_pad_batches()`
# 3. use the packed padded data batches
->TrainingWorker.fit()

Then I can review easily as the same order above : )

Additionally, when you write the core function calling chain responding to your original design in the "Key Changes", you will find that there are some high-level functions missing in your implementation, just like the _pad_batches(). If you add the high-level function, then you can tell others more clearly, and others can read code more easily, because code readers can read and think this in the high level ignoring the messy details.

The unit test can play the same role sometimes. For example, if you want to write unit test to test the core padding function, then you need to abstract the related code pieces into the function _pad_batches() and test it.

YanhuiDua requested review from HIT-cwh, Copilot and hhaAndroid December 24, 2025 12:22

Copilot started reviewing on behalf of YanhuiDua December 24, 2025 12:23 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

YanhuiDua force-pushed the refactor_pack branch from ce11425 to 62ae9fc Compare December 24, 2025 12:49

YanhuiDua requested a review from jayhenry December 24, 2025 13:11

[Refactor] refactor packing in RL train controller and train worker

3be2fb4

jayhenry reviewed Dec 25, 2025

View reviewed changes

xtuner/v1/rl/utils.py Outdated Show resolved Hide resolved

jayhenry reviewed Dec 26, 2025

View reviewed changes

add more clear type annotations in TrainController

2dfbb8b

YanhuiDua force-pushed the refactor_pack branch from 62ae9fc to 2dfbb8b Compare December 29, 2025 10:45

		rollout_logprobs: torch.Tensor \| None


		class RawTrainingController:

[Refactor] refactor packing in RL train controller and train worker #1393

Are you sure you want to change the base?

[Refactor] refactor packing in RL train controller and train worker #1393

Uh oh!

Conversation

YanhuiDua commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Changes

1. Token-aware Pre-allocation

2. Pack & Pad per Bucket

3. Worker-side Training

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

jayhenry Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jayhenry Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

YanhuiDua Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jayhenry Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jayhenry Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

jayhenry commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YanhuiDua commented Dec 24, 2025 •

edited

Loading

YanhuiDua Dec 29, 2025 •

edited

Loading

jayhenry commented Dec 26, 2025 •

edited

Loading