[Fix] Fix device mismatch for ascend npu when loss_cfg.loss_reduction != "token" #1365

nil0x9 · 2025-12-15T17:33:59Z

Currently when one passes loss_cfg.loss_reduction other than "token" on a ascend/npu device, a Runtime Error (device mismatch) is expected in this line:

xtuner/xtuner/v1/loss/ce_loss.py

Line 191 in d339718

loss = (loss * loss_weight).sum()

The root cause of this error is that, in ascend npu device, cu_seq_lens tensors are required to be on cpu. In func build_batches_loss_kwargs, the devuce ofloss_weight is inherited from num_grad_tokens -> boundaries -> cu_seq_lens -- and hence the problem.

… != "token"

nil0x9 marked this pull request as ready for review December 15, 2025 17:38

nil0x9 force-pushed the linty/fix-npu-loss-weight-device-mismatch branch from 8613e2d to f1294d3 Compare December 16, 2025 15:40

[Fix] Fix device mismatch for ascend npu when loss_cfg.loss_reduction…

dacb7ef

… != "token"

nil0x9 force-pushed the linty/fix-npu-loss-weight-device-mismatch branch from f1294d3 to dacb7ef Compare December 24, 2025 08:03

HAOCHENYE approved these changes Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Fix device mismatch for ascend npu when loss_cfg.loss_reduction != "token" #1365

[Fix] Fix device mismatch for ascend npu when loss_cfg.loss_reduction != "token" #1365

Uh oh!

nil0x9 commented Dec 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Fix] Fix device mismatch for ascend npu when loss_cfg.loss_reduction != "token" #1365

Are you sure you want to change the base?

[Fix] Fix device mismatch for ascend npu when loss_cfg.loss_reduction != "token" #1365

Uh oh!

Conversation

nil0x9 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nil0x9 commented Dec 15, 2025 •

edited

Loading