Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/unittest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ permissions:
jobs:
unittest:
# only run on pull request
if: ${{ github.event.issue.pull_request && (startsWith(github.event.comment.body, '/unittest')) && github.event.comment.author_association == 'COLLABORATOR' }}
if: ${{ github.event.issue.pull_request && (startsWith(github.event.comment.body, '/unittest')) && (github.event.comment.author_association == 'COLLABORATOR' || github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') }}
runs-on: self-hosted

steps:
Expand Down
96 changes: 96 additions & 0 deletions docs/sphinx_doc/source/tutorial/example_reasoning_basic.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,102 @@ Run the RFT process with the following command:
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
```

## Optional: Convert Checkpoints to Hugging Face Format

After running Trinity-RFT experiments, the system automatically saves training checkpoints to the following path:

```
${checkpoint_root_dir}/${project}/${name}
```

The directory structure is as follows:

```
${checkpoint_root_dir}/${project}/${name}
├── buffer
│ ├── experience_buffer.jsonl # Stores experience data generated during training
│ └── explorer_output.db # Database file output by the Explorer module
├── log # Contains logs from multiple Ray Actors
│ ├── checkpoint_monitor.log
│ ├── explorer.log
│ ├── explorer_experience_pipeline.log
│ ├── explorer_runner_0.log ... explorer_runner_31.log
│ ├── queue_experience_buffer.log
│ └── synchronizer.log
├── monitor # Monitoring-related files (may be empty)
├── global_step_58 # Example: Full checkpoint at step 58
│ └── actor
│ ├── huggingface # (Optional) Hugging Face formatted model files
│ │ ├── added_tokens.json
│ │ ├── chat_template.jinja
│ │ ├── config.json
│ │ ├── generation_config.json
│ │ ├── merges.txt
│ │ ├── model.safetensors # ← Key model weights file
│ │ ├── special_tokens_map.json
│ │ ├── tokenizer.json
│ │ ├── tokenizer_config.json
│ │ └── vocab.json
│ ├── extra_state_world_size_4_rank_0.pt # Additional state (e.g., random seeds)
│ ├── ...
│ ├── fsdp_config.json # FSDP configuration file
│ ├── model_world_size_4_rank_0.pt ... model_world_size_4_rank_3.pt # Sharded model parameters
│ ├── optim_world_size_4_rank_0.pt ... optim_world_size_4_rank_3.pt # Sharded optimizer states
│ └── ...
├── explorer_meta.json # Metadata for the Explorer module
├── trainer_meta.json # Metadata for the Trainer module
├── latest_checkpointed_iteration.txt # Training step of the most recent full checkpoint
└── latest_state_dict_iteration.txt # Training step of the most recent model parameter save (used for checkpoint synchronization)
```

### When Is Conversion Needed?

If you wish to use the model in **Hugging Face format** (e.g., for inference or deployment), but find that the `model.safetensors` file is **missing** from the `global_step_*/actor/huggingface/` directory, you need to manually perform the conversion.

### Conversion Tool: `trinity convert`

The `trinity convert` command provides flexible model conversion capabilities and supports the following usage patterns:

#### ✅ Batch Conversion (Recommended)
Point `--checkpoint-dir` to your project root directory (i.e., the path containing multiple `global_step_*` subdirectories). The tool will **automatically recursively scan for all `global_step_*` directories** and convert each checkpoint accordingly.

```bash
trinity convert --checkpoint-dir ${checkpoint_root_dir}/${project}/${name}
```

This command will:
- Automatically detect all subdirectories matching the pattern `global_step_<number>`;
- Convert the `actor` model within each subdirectory;
- Save the resulting Hugging Face–formatted files (including `model.safetensors`, etc.) into the corresponding `actor/huggingface/` subdirectory.

#### ✅ Single-step Conversion
If you only want to convert a model from a specific training step, directly point `--checkpoint-dir` to the corresponding `global_step_XXX` folder:

```bash
trinity convert --checkpoint-dir ${checkpoint_root_dir}/${project}/${name}/global_step_120
```

#### ✅ Path Tolerance
Even if you specify a subpath inside a `global_step_XXX` directory (e.g., `.../global_step_120/actor`), the tool can intelligently recognize the correct context and complete the conversion successfully—no need to strictly align the path to the `global_step_XXX` level.

### Special Case: Missing Base Model Configuration

If a `config.json` file is **missing** from any `global_step_*/actor/huggingface/` directory (typically because the configuration wasn't fully saved during training), the conversion process requires the original base model's configuration. In this case, use `--base-model-dir` to specify the path to your base model:

```bash
trinity convert \
--checkpoint-dir ${checkpoint_root_dir}/${project}/${name} \
--base-model-dir /path/to/your/base/model
```

> 💡 This parameter applies to **all scanned checkpoints**. If any checkpoint lacks `config.json`, you must provide this argument.

### Notes

- **Actor Model Only**: The current `trinity convert` command only processes model parameters in the `actor` folder and **does not handle `critic` models** (even if they exist). Converting Critic models requires separate operations.
- **Automatic Training Format Detection**: `trinity convert` natively supports checkpoints from both **FSDP** and **Megatron** distributed training formats. **No additional parameters are required**—the tool automatically detects the format and correctly merges the sharded weights.
- **Idempotency**: If a `global_step_*` checkpoint already contains a complete set of Hugging Face files (especially `model.safetensors`) in its `huggingface/` directory, the conversion will be skipped to avoid redundant processing.
- **Performance Tip**: The conversion process can be time-consuming, especially when dealing with many checkpoints or large models. It's recommended to run this during off-peak hours.


## Optional: RFT with SFT Warmup
Expand Down
9 changes: 7 additions & 2 deletions docs/sphinx_doc/source/tutorial/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,9 +190,14 @@ for exp in exp_list:

**Q:** How to load the checkpoints outside of the Trinity-RFT framework?

**A:** You need to specify model path and checkpoint path. The following code snippet gives an example with transformers.
**A:** Currently, two loading methods are supported:

Here is an example of loading from fsdp trainer checkpoints:
1. **Recommended approach**: Use the `trinity convert` command to convert the original checkpoint into the standard Hugging Face format.
After conversion, you can load and use it directly just like any ordinary Hugging Face model.
For detailed instructions, please refer to the tutorial: [Optional: Converting Checkpoints to Hugging Face Format](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html#optional-convert-checkpoints-to-hugging-face-format)

2. **Direct loading (for actor checkpoints trained with FSDP)**:
If you prefer to load the checkpoint directly without converting its format, you can use the following code example:

```python
import os
Expand Down
97 changes: 97 additions & 0 deletions docs/sphinx_doc/source_zh/tutorial/example_reasoning_basic.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,103 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
```


## 进阶选项:将检查点转换为 Hugging Face 格式

在运行 Trinity-RFT 进行实验后,系统会自动将训练过程中的检查点(checkpoint)保存到以下路径:

```
${checkpoint_root_dir}/${project}/${name}
```

该目录的结构如下:

```
${checkpoint_root_dir}/${project}/${name}
├── buffer
│ ├── experience_buffer.jsonl # 存储训练过程中生成的经验数据
│ └── explorer_output.db # Explorer 模块输出的数据库文件
├── log # 包含多个 Ray Actor 的日志
│ ├── checkpoint_monitor.log
│ ├── explorer.log
│ ├── explorer_experience_pipeline.log
│ ├── explorer_runner_0.log ... explorer_runner_31.log
│ ├── queue_experience_buffer.log
│ └── synchronizer.log
├── monitor # 监控相关文件(可能为空)
├── global_step_58 # 示例:第 58 步的完整检查点
│ └── actor
│ ├── huggingface # (可选)Hugging Face 格式的模型文件
│ │ ├── added_tokens.json
│ │ ├── chat_template.jinja
│ │ ├── config.json
│ │ ├── generation_config.json
│ │ ├── merges.txt
│ │ ├── model.safetensors # ← 关键模型权重文件
│ │ ├── special_tokens_map.json
│ │ ├── tokenizer.json
│ │ ├── tokenizer_config.json
│ │ └── vocab.json
│ ├── extra_state_world_size_4_rank_0.pt # 额外状态(如随机数种子等)
│ ├── ...
│ ├── fsdp_config.json # FSDP 配置文件
│ ├── model_world_size_4_rank_0.pt ... model_world_size_4_rank_3.pt # 分片模型参数
│ ├── optim_world_size_4_rank_0.pt ... optim_world_size_4_rank_3.pt # 分片优化器状态
│ └── ...
├── explorer_meta.json # Explorer 模块的元数据
├── trainer_meta.json # Trainer 模块的元数据
├── latest_checkpointed_iteration.txt # 最近一次完整检查点的训练步数
└── latest_state_dict_iteration.txt # 最近一次保存模型参数的训练步数(用于 checkpoint 同步)
```

### 何时需要转换?

如果你希望使用 **Hugging Face 格式** 的模型(例如用于推理或部署),但发现 `global_step_*/actor/huggingface/` 目录中 **缺少 `model.safetensors` 文件**,就需要手动执行转换。

### 转换工具:`trinity convert`

`trinity convert` 命令提供了灵活的模型转换功能,支持以下几种使用方式:

#### ✅ 批量转换(推荐)
将 `--checkpoint-dir` 指向项目根目录(即包含多个 `global_step_*` 子目录的路径),工具会**自动递归查找所有 `global_step_*` 目录**,并对每个检查点执行转换。

```bash
trinity convert --checkpoint-dir ${checkpoint_root_dir}/${project}/${name}
```

该命令会:
- 自动识别所有形如 `global_step_数字` 的子目录;
- 对每个子目录中的 `actor` 模型进行转换;
- 将生成的 Hugging Face 格式文件(包括 `model.safetensors` 等)保存到对应的 `actor/huggingface/` 目录中。

#### ✅ 单步转换
如果只想转换某一个特定训练步的模型,可直接将 `--checkpoint-dir` 指向对应的 `global_step_XXX` 文件夹:

```bash
trinity convert --checkpoint-dir ${checkpoint_root_dir}/${project}/${name}/global_step_120
```

#### ✅ 路径容错
即使你指定了 `global_step_XXX` 内部的子路径(例如 `.../global_step_120/actor`),工具也能智能识别并正确完成转换,无需严格对齐到 `global_step_XXX` 层级。

### 特殊情况:缺少基础模型配置

如果某个 `global_step_*/actor/huggingface/` 目录中 **缺少 `config.json`**(通常是因为训练时未完整保存配置),转换过程需要原始基础模型的配置文件。此时,请通过 `--base-model-dir` 指定基础模型路径:

```bash
trinity convert \
--checkpoint-dir ${checkpoint_root_dir}/${project}/${name} \
--base-model-dir /path/to/your/base/model
```

> 💡 此参数适用于**所有被扫描到的检查点**。只要任意一个检查点缺少 `config.json`,就需要提供该参数。

### 注意事项

- **仅转换 Actor 模型**:当前 `trinity convert` 仅处理 `actor` 文件夹中的模型参数,**不会处理 `critic`**(即使存在)。若需转换 Critic 模型,需另行操作。
- **自动识别训练格式**:`trinity convert` 原生支持 **FSDP** 和 **Megatron** 两种分布式训练格式的检查点,**无需额外指定参数**,工具会自动检测并正确合并分片权重。
- **幂等性**:如果某个 `global_step_*` 的 `huggingface/` 目录已包含完整的 Hugging Face 文件(特别是 `model.safetensors`),该检查点将被跳过,避免重复转换。
- **性能提示**:转换过程可能较耗时,尤其是当检查点数量多或模型较大时。建议在空闲时段运行。


## 进阶选项:带 SFT warmup 的 RFT

Expand Down
9 changes: 7 additions & 2 deletions docs/sphinx_doc/source_zh/tutorial/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,9 +183,14 @@ for exp in exp_list:

**Q:** 如何在 Trinity-RFT 框架外加载 checkpoints?

**A:** 你需要指定模型路径和检查点路径。以下代码片段展示了如何使用 transformers 库进行加载。
**A:** 目前支持两种加载方式:

以下是加载 FSDP trainer 检查点的示例:
1. **推荐方式**:使用 `trinity convert` 命令将原始检查点转换为标准的 Hugging Face 格式。
转换后,你就可以像加载普通 Hugging Face 模型一样直接使用它。
详细操作请参考教程:[可选:将检查点转换为 Hugging Face 格式](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html#hugging-face)

2. **直接加载(适用于 FSDP 训练的 actor 检查点)**:
如果你希望不转换格式而直接加载,可以使用以下代码示例:

```python
import os
Expand Down
45 changes: 2 additions & 43 deletions examples/mix_chord/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,49 +56,8 @@ trinity run --config examples/mix_chord/mix_chord_toolace.yaml

It takes around 3 hours to run on 8 H20 GPUs.

After the run, you may also want to convert the checkpoint to a Hugging Face checkpoint.

```python
import os
from transformers import AutoTokenizer, AutoModelForCausalLM
from trinity.common.models.utils import load_fsdp_state_dict_from_verl_checkpoint

# The following variables are assumed to be predefined:
# model_path, checkpoint_root_dir, project, name
model = AutoModelForCausalLM.from_pretrained(model_path)
ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_100", "actor")
state_dict = load_fsdp_state_dict_from_verl_checkpoint(ckp_path)
model.load_state_dict(state_dict)
output_dir = os.path.join(ckp_path, "huggingface")

def save_to_huggingface_checkpoint(state_dict: dict, output_dir: str):
"""Convert state dict to Hugging Face format and save it.

Args:
state_dict: The state dict loaded from the Verl checkpoint.
output_dir: The directory to save the Hugging Face checkpoint.
"""
import os
import torch
from transformers import PreTrainedModel

os.makedirs(output_dir, exist_ok=True)

# Convert state dict keys to Hugging Face format if needed
hf_state_dict = {}
for key, value in state_dict.items():
# Add any key mapping logic here if needed
# Example:
# if key.startswith("model."):
# new_key = key.replace("model.", "")
# hf_state_dict[new_key] = value
# else:
# hf_state_dict[key] = value
hf_state_dict[key] = value
torch.save(hf_state_dict, os.path.join(output_dir, "pytorch_model.bin"))

save_to_huggingface_checkpoint(state_dict, output_dir)
```
After the run, you can use the `trinity convert` command to convert the original checkpoint into the standard Hugging Face format. For detailed instructions, please refer to the tutorial: [Optional: Converting Checkpoints to Hugging Face Format](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html#optional-convert-checkpoints-to-hugging-face-format)


## Evaluate the Trained Model on BFCL

Expand Down
22 changes: 14 additions & 8 deletions tests/trainer/trainer_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
get_vision_language_model_path,
)
from trinity.buffer import get_buffer_reader
from trinity.cli.launcher import bench, both, explore, run, serve, train
from trinity.cli.launcher import bench, both, convert, explore, run, serve, train
from trinity.common.config import (
AlgorithmConfig,
BufferConfig,
Expand Down Expand Up @@ -98,7 +98,7 @@ def test_trainer(self):
eval_tasksets[0].repeat_times = 4
eval_tasksets[1].repeat_times = 4
self.config.trainer.save_interval = 4
self.config.trainer.save_hf_checkpoint = "always"
self.config.trainer.save_hf_checkpoint = "never"
if self.strategy == "megatron":
self.config.trainer.trainer_strategy = "megatron"
self.config.check_and_update()
Expand Down Expand Up @@ -144,12 +144,18 @@ def test_trainer(self):
)
self.assertGreater(len(os.listdir(os.path.join(checkpoint_step_4, "actor"))), 0)
self.assertGreater(len(os.listdir(os.path.join(checkpoint_step_8, "actor"))), 0)
self.assertGreater(
len(os.listdir(os.path.join(checkpoint_step_4, "actor", "huggingface"))), 0
)
self.assertGreater(
len(os.listdir(os.path.join(checkpoint_step_8, "actor", "huggingface"))), 0
)
hf_dir_step_4 = os.listdir(os.path.join(checkpoint_step_4, "actor", "huggingface"))
hf_dir_step_8 = os.listdir(os.path.join(checkpoint_step_8, "actor", "huggingface"))
self.assertGreater(len(hf_dir_step_4), 0)
self.assertGreater(len(hf_dir_step_8), 0)
self.assertNotIn("model.safetensors", hf_dir_step_4)
self.assertNotIn("model.safetensors", hf_dir_step_8)
# test checkpoint convert
convert(self.config.checkpoint_job_dir)
hf_dir_step_4 = os.listdir(os.path.join(checkpoint_step_4, "actor", "huggingface"))
hf_dir_step_8 = os.listdir(os.path.join(checkpoint_step_8, "actor", "huggingface"))
self.assertIn("model.safetensors", hf_dir_step_4)
self.assertIn("model.safetensors", hf_dir_step_8)
self.assertEqual(step_num, 8)
ray.init(ignore_reinit_error=True, namespace=self.config.ray_namespace)
# test bench mode
Expand Down
Loading