Skip to content

Conversation

@anjilab
Copy link

@anjilab anjilab commented Oct 31, 2025

Problem Statement

When distilling from teacher model Qwen/Qwen2.5-7B-Instruct to student model Qwen/Qwen2.5-3B, mismatched model vocab size issue.

RuntimeError: The size of tensor a (152064) must match the size of tensor b (151936) at non-singleton dimension 2

Description of Change

Though both model has same tokenizer vocab size of 151665, their model vocab size (embedding) are different where Qwen/Qwen2.5-7B-Instruct has 152064 and Qwen/Qwen2.5-3B has 151936. It is safe to truncate the extra padded tokens from the teacher logits (as no real token ID points to them).

Added a conditional fix to handle Qwen2 model types where vocab size mismatch may occur. The change ensures that teacher logits are truncated to match the student’s vocabulary size before loss computation.

@anjilab anjilab changed the title fix: qwen2 teacher student model unmatched model vocab size fix(minillm): qwen2 teacher student model unmatched model vocab size Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant