fix(minillm): qwen2 teacher student model unmatched model vocab size #331

anjilab · 2025-10-31T21:14:32Z

Problem Statement

When distilling from teacher model Qwen/Qwen2.5-7B-Instruct to student model Qwen/Qwen2.5-3B, mismatched model vocab size issue.

RuntimeError: The size of tensor a (152064) must match the size of tensor b (151936) at non-singleton dimension 2

Description of Change

Though both model has same tokenizer vocab size of 151665, their model vocab size (embedding) are different where Qwen/Qwen2.5-7B-Instruct has 152064 and Qwen/Qwen2.5-3B has 151936. It is safe to truncate the extra padded tokens from the teacher logits (as no real token ID points to them).

Added a conditional fix to handle Qwen2 model types where vocab size mismatch may occur. The change ensures that teacher logits are truncated to match the student’s vocabulary size before loss computation.

fix: qwen2 teacher student model unmatched model vocab size

49b49c6

anjilab changed the title ~~fix: qwen2 teacher student model unmatched model vocab size~~ fix(minillm): qwen2 teacher student model unmatched model vocab size Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(minillm): qwen2 teacher student model unmatched model vocab size #331

fix(minillm): qwen2 teacher student model unmatched model vocab size #331

Uh oh!

anjilab commented Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(minillm): qwen2 teacher student model unmatched model vocab size #331

Are you sure you want to change the base?

fix(minillm): qwen2 teacher student model unmatched model vocab size #331

Uh oh!

Conversation

anjilab commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement

Description of Change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anjilab commented Oct 31, 2025 •

edited

Loading