Skip to content

Conversation

@a4lg
Copy link
Contributor

@a4lg a4lg commented Dec 13, 2025

What does this PR do?

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5.

In this version, handling of MoE tensors are significantly changed so that support for all MoE + GGUF models ... (okay, actually) Qwen2/3 MoE models in Transformers v4 is now broken in v5.

This commit now adopts new tensor handling, along with extended TensorProcessor with capabilities to handle not only tensor data but also tensor mappings.
In this process, Qwen2/3 MoE-specific hack is moved to Qwen2MoeTensorProcessor, making the main function to look more model-agnostic.

This is fully tested on Qwen2 MoE Qwen1.5-MoE-A2.7B (with 14.3B total parameters) and partially on Qwen3 MoE Qwen3-30B-A3B-Thinking-2507 (due to memory constraints).

Future Possibilities

Portions of this change is written to be model-agnostic and easily replaceable.
If we decide to add more GGUF support to MoE models, we'd better to have either a mix-in or an utility. In this case, a part of Qwen2MoeTensorProcessor can be copied to that with small modification.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Cyrilvallez @SunMarc @MekkCyber

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5.

In this version, handling of MoE tensors are significantly changed so
that support for all MoE + GGUF models ... (okay, only) Qwen2/3 MoE
models in Transformers v4 is now broken.

This commit now adopts new tensor handling, along with extended
`TensorProcessor` with capabilities to handle not only tensor data
but also tensor mappings.  In this process, Qwen2/3 MoE-specific hack
is moved to `Qwen2MoeTensorProcessor`, making the main function to look
more model-agnostic.

This is fully tested on Qwen2 MoE `Qwen1.5-MoE-A2.7B` and partially on
Qwen3 MoE `Qwen3-30B-A3B-Thinking-2507` (due to memory constraints).

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
@a4lg a4lg force-pushed the gguf-support-v5-qwen23-moe branch from 688c1bf to d82151b Compare December 13, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant