models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024
Open
rohithj7 wants to merge 3 commits into
Open
models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024rohithj7 wants to merge 3 commits into
rohithj7 wants to merge 3 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Fixes loading of Qwen3.5 dense (
Qwen3_5ForCausalLM) and MoE (Qwen3_5MoeForCausalLM) GGUFs that fail at load time with:where
<N> == num_hidden_layers(the first index past the trunk).The converter writes
block_count = num_hidden_layers + mtp_num_hidden_layersand anextn_predict_layerskey wheneverconfig.jsondeclaresmtp_num_hidden_layers, even when the checkpoint contains nomtp.*weights. The runtime then derivesn_layer_all = block_countand unconditionally constructs the trailing MTP/NextN block, markingblk.<N>.attn_norm.weight(and the other MTP tensors) as required. For a trunk-only GGUF this block is never present, so load aborts.src/models/step35.cppalready handles this: it probes for the defining MTP tensor and, when absent, marks the MTP block tensorsTENSOR_NOT_REQUIRED("trunk-only"). This PR ports that sametrunk_onlyhandling tosrc/models/qwen35.cppandsrc/models/qwen35moe.cpp, which previously hardcoded the MTP block tensors as required.After the change:
n_layer()excludes nextn layers).graph_mtp) path keeps working.Closes #24737.
Closes #24211.
Additional information
Same failure family reported in #24737 (Qwen3.5-4B,
blk.32), #24211 (Nex N2 Pro / Qwen3.5 397B MoE,blk.60), and the Qwen3.5-122B MoE GGUF discussion (blk.48). The MTP-in-GGUF mapping and runtime were added in #20533 / #22673; the step35 trunk-only fix landed in #24340 but was not ported to the qwen35 loaders.Requirements
step35already had this change so I had to replicate that forqwen3.5. I made the changes myself. Further, I used AI to write this PR description.