-
Notifications
You must be signed in to change notification settings - Fork 209
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When enabling LoRA on the DTensor backend, end-to-end training throughput degrades compared to the same setup without LoRA. This appears even when LoRA rank is small and other settings are unchanged.
Reproduce
Disable Lora
NRL_FORCE_REBUILD_VENVS=true uv run examples/run_sft.py \
logger.wandb_enabled=True \
logger.wandb.project=lora \
logger.wandb.name=nano_v3_lora \
policy.model_name=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
checkpointing.enabled=False \
policy.dtensor_cfg.enabled=true \
policy.dtensor_cfg._v2=true \
policy.dtensor_cfg.lora_cfg.enabled=False \
policy.dtensor_cfg.lora_cfg.use_triton=False \
policy.dtensor_cfg.lora_cfg.dim=8 \
policy.max_total_sequence_length=2048 \
policy.train_global_batch_size=16 \
policy.train_micro_batch_size=1 \
policy.optimizer.name="torch.optim.Adam" \
~policy.tokenizer.chat_template \
sft.max_num_steps=10 \
cluster.num_nodes=2 \
cluster.gpus_per_node=8Enable Lora
NRL_FORCE_REBUILD_VENVS=true uv run examples/run_sft.py \
logger.wandb_enabled=True \
logger.wandb.project=lora \
logger.wandb.name=nano_v3_lora \
policy.model_name=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
checkpointing.enabled=False \
policy.dtensor_cfg.enabled=true \
policy.dtensor_cfg._v2=true \
policy.dtensor_cfg.lora_cfg.enabled=True \
policy.dtensor_cfg.lora_cfg.use_triton=False \
policy.dtensor_cfg.lora_cfg.dim=8 \
policy.max_total_sequence_length=2048 \
policy.train_global_batch_size=16 \
policy.train_micro_batch_size=1 \
policy.optimizer.name="torch.optim.Adam" \
~policy.tokenizer.chat_template \
sft.max_num_steps=10 \
cluster.num_nodes=2 \
cluster.gpus_per_node=8Observed Behavior

coderabbitai
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working