models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF) by rohithj7 · Pull Request #25024 · ggml-org/llama.cpp

rohithj7 · 2026-06-26T00:03:26Z

Overview

Fixes loading of Qwen3.5 dense (Qwen3_5ForCausalLM) and MoE (Qwen3_5MoeForCausalLM) GGUFs that fail at load time with:

llama_model_load: error loading model: missing tensor 'blk.<N>.attn_norm.weight'

where <N> == num_hidden_layers (the first index past the trunk).

The converter writes block_count = num_hidden_layers + mtp_num_hidden_layers and a nextn_predict_layers key whenever config.json declares mtp_num_hidden_layers, even when the checkpoint contains no mtp.* weights. The runtime then derives n_layer_all = block_count and unconditionally constructs the trailing MTP/NextN block, marking blk.<N>.attn_norm.weight (and the other MTP tensors) as required. For a trunk-only GGUF this block is never present, so load aborts.

src/models/step35.cpp already handles this: it probes for the defining MTP tensor and, when absent, marks the MTP block tensors TENSOR_NOT_REQUIRED ("trunk-only"). This PR ports that same trunk_only handling to src/models/qwen35.cpp and src/models/qwen35moe.cpp, which previously hardcoded the MTP block tensors as required.

After the change:

Trunk-only GGUFs load and run normal inference (the MTP block is never executed in the main graph; n_layer() excludes nextn layers).
GGUFs that actually bundle the MTP block are unchanged - the tensors are still required and the speculative (graph_mtp) path keeps working.

Closes #24737.
Closes #24211.

Additional information

Same failure family reported in #24737 (Qwen3.5-4B, blk.32), #24211 (Nex N2 Pro / Qwen3.5 397B MoE, blk.60), and the Qwen3.5-122B MoE GGUF discussion (blk.48). The MTP-in-GGUF mapping and runtime were added in #20533 / #22673; the step35 trunk-only fix landed in #24340 but was not ported to the qwen35 loaders.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - I used an AI assistant to help me understand the issue and identify what needed to change, and to get a more thorough understanding of the relevant code. It helped me realize that step35 already had this change so I had to replicate that for qwen3.5. I made the changes myself. Further, I used AI to write this PR description.

…oe models

missing tensor issue

Rohith Iyengar and others added 3 commits June 25, 2026 13:06

fix: update tensor loading logic for MTP layers in qwen35 and qwen35m…

4c0b1ad

…oe models

Merge pull request #1 from rohithj7/rohith/missing-tensor-issues

7d71418

missing tensor issue

Merge branch 'ggml-org:master' into master

e8bcb77

rohithj7 requested a review from CISC as a code owner June 26, 2026 00:03

github-actions Bot added the model Model specific label Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024

models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024
rohithj7 wants to merge 3 commits into
ggml-org:masterfrom
rohithj7:master

rohithj7 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rohithj7 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rohithj7 commented Jun 26, 2026 •

edited

Loading