We introduce the Large Language Diffusion with Ordered Unmasking (LLaDOU), which is trained by reinforcing a new reasoning paradigm named the Diffusion Chain of Lateral Thought (DCoLT) for diffusion language models.
Compared to standard CoT, DCoLT is distinguished with several notable features:
- Bidirectional Reasoning: Allowing global refinement throughout generations with bidirectional self-attention masks.
- Format-Free Reasoning: No strict rule on grammatical correctness amid its intermediate steps of thought.
- Nonlinear Generation: Generating tokens at various positions in different steps.
[Sep 2025]LLaDOU has been accepted by NeurIPS 2025. Congrats![July 2025]Training code is provided![May 2025]Released LLaDOU v0 Math and LLaDOU v0 Code models, their evaluation code and technique report.
import torch
from transformers import AutoTokenizer
from networks.lladou_v0 import LLaDOUModelLM, sample
tokenizer = AutoTokenizer.from_pretrained("models/LLaDOU-v0-Math")
model = LLaDOUModelLM.from_pretrained(
pretrained_model_name_or_path="models/LLaDOU-v0-Math",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="cuda",
)
problem = "What is the answer of 1+1?"
outputs = sample(
model,
problem,
tokenizer,
device="cuda",
)
response = outputs["responses"][0]
print(response)We provide an example to train LLaDOU on GSM8K dataset, feel free to change the configuration file!
accelerate launch --num_processes 8 --config_file configs/accelerate/fsdp.yaml train.py --config configs/gsm8k_64step_example.yamlPrepare datasets as following:
├── datasets
│ ├── gsm8k
│ │ └── ...
│ ├── MATH
│ │ └── ...
│ ├── mbpp.jsonl
│ ├── mbpp_test.jsonl
│ └── HumanEval.jsonl.gz
- For GSM8K and MATH evaluation, please run scripts/eval_math.sh.
- For MBPP and HumanEval evaluation, please run scripts/eval_code.sh.
If this repository helps with your work, please consider giving a star and citation:
@inproceedings{huang2025reinforcing,
title={Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models},
author={Zemin Huang and Zhiyang Chen and Zijun Wang and Tiancheng Li and Guo-Jun Qi},
journal={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025}
}

