Skip to content

models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024

Open
rohithj7 wants to merge 3 commits into
ggml-org:masterfrom
rohithj7:master
Open

models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024
rohithj7 wants to merge 3 commits into
ggml-org:masterfrom
rohithj7:master

Conversation

@rohithj7

@rohithj7 rohithj7 commented Jun 26, 2026

Copy link
Copy Markdown

Overview

Fixes loading of Qwen3.5 dense (Qwen3_5ForCausalLM) and MoE (Qwen3_5MoeForCausalLM) GGUFs that fail at load time with:

llama_model_load: error loading model: missing tensor 'blk.<N>.attn_norm.weight'

where <N> == num_hidden_layers (the first index past the trunk).

The converter writes block_count = num_hidden_layers + mtp_num_hidden_layers and a nextn_predict_layers key whenever config.json declares mtp_num_hidden_layers, even when the checkpoint contains no mtp.* weights. The runtime then derives n_layer_all = block_count and unconditionally constructs the trailing MTP/NextN block, marking blk.<N>.attn_norm.weight (and the other MTP tensors) as required. For a trunk-only GGUF this block is never present, so load aborts.

src/models/step35.cpp already handles this: it probes for the defining MTP tensor and, when absent, marks the MTP block tensors TENSOR_NOT_REQUIRED ("trunk-only"). This PR ports that same trunk_only handling to src/models/qwen35.cpp and src/models/qwen35moe.cpp, which previously hardcoded the MTP block tensors as required.

After the change:

  • Trunk-only GGUFs load and run normal inference (the MTP block is never executed in the main graph; n_layer() excludes nextn layers).
  • GGUFs that actually bundle the MTP block are unchanged - the tensors are still required and the speculative (graph_mtp) path keeps working.

Closes #24737.
Closes #24211.

Additional information

Same failure family reported in #24737 (Qwen3.5-4B, blk.32), #24211 (Nex N2 Pro / Qwen3.5 397B MoE, blk.60), and the Qwen3.5-122B MoE GGUF discussion (blk.48). The MTP-in-GGUF mapping and runtime were added in #20533 / #22673; the step35 trunk-only fix landed in #24340 but was not ported to the qwen35 loaders.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - I used an AI assistant to help me understand the issue and identify what needed to change, and to get a more thorough understanding of the relevant code. It helped me realize that step35 already had this change so I had to replicate that for qwen3.5. I made the changes myself. Further, I used AI to write this PR description.

@rohithj7 rohithj7 requested a review from CISC as a code owner June 26, 2026 00:03
@github-actions github-actions Bot added the model Model specific label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Qwen3.5-4B: GGUF conversion/load expects 33 blocks, model only has 32 Eval bug: Missing layer error when running a quant of Nex N2 Pro

1 participant