Skip to content

[Dtensor] SFT with LoRA is slower than without LoRA #1688

@RayenTian

Description

@RayenTian

Describe the bug

When enabling LoRA on the DTensor backend, end-to-end training throughput degrades compared to the same setup without LoRA. This appears even when LoRA rank is small and other settings are unchanged.

Reproduce

Disable Lora

NRL_FORCE_REBUILD_VENVS=true uv run examples/run_sft.py \
logger.wandb_enabled=True \
logger.wandb.project=lora \
logger.wandb.name=nano_v3_lora \
policy.model_name=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
checkpointing.enabled=False \
policy.dtensor_cfg.enabled=true \
policy.dtensor_cfg._v2=true \
policy.dtensor_cfg.lora_cfg.enabled=False \
policy.dtensor_cfg.lora_cfg.use_triton=False \
policy.dtensor_cfg.lora_cfg.dim=8 \
policy.max_total_sequence_length=2048 \
policy.train_global_batch_size=16 \
policy.train_micro_batch_size=1 \
policy.optimizer.name="torch.optim.Adam" \
~policy.tokenizer.chat_template  \
sft.max_num_steps=10 \
cluster.num_nodes=2 \
cluster.gpus_per_node=8

Enable Lora

NRL_FORCE_REBUILD_VENVS=true uv run examples/run_sft.py \
logger.wandb_enabled=True \
logger.wandb.project=lora \
logger.wandb.name=nano_v3_lora \
policy.model_name=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
checkpointing.enabled=False \
policy.dtensor_cfg.enabled=true \
policy.dtensor_cfg._v2=true \
policy.dtensor_cfg.lora_cfg.enabled=True \
policy.dtensor_cfg.lora_cfg.use_triton=False \
policy.dtensor_cfg.lora_cfg.dim=8 \
policy.max_total_sequence_length=2048 \
policy.train_global_batch_size=16 \
policy.train_micro_batch_size=1 \
policy.optimizer.name="torch.optim.Adam" \
~policy.tokenizer.chat_template  \
sft.max_num_steps=10 \
cluster.num_nodes=2 \
cluster.gpus_per_node=8

Observed Behavior

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions