Qwen2/3 MoE + GGUF model support (restored) #42854

a4lg · 2025-12-13T00:04:40Z

What does this PR do?

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5.

In this version, handling of MoE tensors are significantly changed so that support for all MoE + GGUF models ... (okay, actually) Qwen2/3 MoE models in Transformers v4 is now broken in v5.

This commit now adopts new tensor handling, along with extended TensorProcessor with capabilities to handle not only tensor data but also tensor mappings.
In this process, Qwen2/3 MoE-specific hack is moved to Qwen2MoeTensorProcessor, making the main function to look more model-agnostic.

This is fully tested on Qwen2 MoE Qwen1.5-MoE-A2.7B (with 14.3B total parameters) and partially on Qwen3 MoE Qwen3-30B-A3B-Thinking-2507 (due to memory constraints).

Future Possibilities

Portions of this change is written to be model-agnostic and easily replaceable.
If we decide to add more GGUF support to MoE models, we'd better to have either a mix-in or an utility. In this case, a part of Qwen2MoeTensorProcessor can be copied to that with small modification.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Cyrilvallez @SunMarc @MekkCyber

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5. In this version, handling of MoE tensors are significantly changed so that support for all MoE + GGUF models ... (okay, only) Qwen2/3 MoE models in Transformers v4 is now broken. This commit now adopts new tensor handling, along with extended `TensorProcessor` with capabilities to handle not only tensor data but also tensor mappings. In this process, Qwen2/3 MoE-specific hack is moved to `Qwen2MoeTensorProcessor`, making the main function to look more model-agnostic. This is fully tested on Qwen2 MoE `Qwen1.5-MoE-A2.7B` and partially on Qwen3 MoE `Qwen3-30B-A3B-Thinking-2507` (due to memory constraints). Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

a4lg force-pushed the gguf-support-v5-qwen23-moe branch from 688c1bf to d82151b Compare December 13, 2025 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen2/3 MoE + GGUF model support (restored) #42854

Qwen2/3 MoE + GGUF model support (restored) #42854

a4lg commented Dec 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Qwen2/3 MoE + GGUF model support (restored) #42854

Are you sure you want to change the base?

Qwen2/3 MoE + GGUF model support (restored) #42854

Conversation

a4lg commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Future Possibilities

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

a4lg commented Dec 13, 2025 •

edited

Loading