[Feature] Qwen3 VL eagle3 support #251

dcw02 · 2025-10-10T19:14:28Z

Motivation

** Draft PR. This is currently WIP **

Add eagle3 support for qwen3_vl and qwen3_vl_moe models.

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

dcw02 · 2025-10-13T16:53:55Z

Training finishes for tp size 1 and both sdpa/flex attention backends. Loss/acc curves look ok, going to add qwen3_vl_moe eagle3 support into sglang/vllm (so I can eval) before adding tp size > 1 support.

dcw02 · 2025-10-31T00:47:32Z

Marking this ready for review, adds qwen3_vl and qwen3_vl_moe EAGLE3 draft training support for tp size 1. SGLang support is being worked on/cleaned up.

Training graphs:

kevin19891229 · 2025-11-10T02:44:15Z

When will verification be supported?

dcw02 · 2025-11-10T02:49:08Z

When will verification be supported?

There's a branch of sglang here that you can run. It's currently being cleaned up for upstreaming. We were able to confirm an accept length of almost 5 for a spec config of 4, 6, 24 for one of our tasks for qwen3-vl-8b.

FrankLeeeee · 2025-11-10T09:16:03Z

Can you rebase this PR to the latest main?

dcw02 · 2025-11-12T19:14:35Z

Can you rebase this PR to the latest main?

yes I'll get to it this week

dcw02 · 2025-11-24T03:22:21Z

sorry for the delay @FrankLeeeee I finished the rebase

FrankLeeeee · 2025-11-24T03:40:15Z

@KerwinKai this PR seems to overlap with yours, can you take a look?

KerwinKai · 2025-11-24T03:46:56Z

@KerwinKai this PR seems to overlap with yours, can you take a look?

Yes, I'll try.

ooolmk · 2025-11-27T08:07:44Z

` elif (
args.is_vlm
and draft_model_config.target_model_type == "qwen3_vl"
and args.tp_size == 1
):
from transformers import Qwen3VLForConditionalGeneration

        target_model = (
            Qwen3VLForConditionalGeneration.from_pretrained(
                pretrained_model_name_or_path=args.target_model_path,
                dtype=torch.bfloat16,
            )
            .eval()
            .cuda()
        )`

Initialize the target model using Qwen3VLForConditionalGeneration from the Transformers library, but the class definition does not include set_aux_hidden_states_layers, causing the error:
AttributeError: 'Qwen3VLForConditionalGeneration' object has no attribute 'set_aux_hidden_states_layers'.

How should I modify Qwen3VLForConditionalGeneration? I noticed there is a class Eagle3TargetModel(ABC), but I’m not sure how to use it.

dcw02 · 2025-11-27T08:16:44Z

` elif ( args.is_vlm and draft_model_config.target_model_type == "qwen3_vl" and args.tp_size == 1 ): from transformers import Qwen3VLForConditionalGeneration
        target_model = (
            Qwen3VLForConditionalGeneration.from_pretrained(
                pretrained_model_name_or_path=args.target_model_path,
                dtype=torch.bfloat16,
            )
            .eval()
            .cuda()
        )`
Initialize the target model using Qwen3VLForConditionalGeneration from the Transformers library, but the class definition does not include set_aux_hidden_states_layers, causing the error: AttributeError: 'Qwen3VLForConditionalGeneration' object has no attribute 'set_aux_hidden_states_layers'.

How should I modify Qwen3VLForConditionalGeneration? I noticed there is a class Eagle3TargetModel(ABC), but I’m not sure how to use it.

you should specify backend as hf rather than sglang backend

ooolmk · 2025-11-28T02:46:10Z

Initialize the target model using Qwen3VLForConditionalGeneration from the Transformers library, but the class definition does not include set_aux_hidden_states_layers, causing the error: AttributeError: 'Qwen3VLForConditionalGeneration' object has no attribute 'set_aux_hidden_states_layers'.
How should I modify Qwen3VLForConditionalGeneration? I noticed there is a class Eagle3TargetModel(ABC), but I’m not sure how to use it.

you should specify backend as hf rather than sglang backend

Thanks a lot! But I finally resolved the 'set_aux_hidden_states_layers' issue by
target_model = HFEagle3TargetModel(target_model) and tune off all the fsdp & dp config
I successfully trained a qwen3-vl-2b-eagle3 model based on your commits. It can now perform inference on SLGANG. I'm preparing to switch to a larger dataset to further validate its performance.

dcw02 closed this Oct 13, 2025

dcw02 force-pushed the qwen3_vl_moe branch from 81966ba to 1bfe145 Compare October 13, 2025 16:09

dcw02 reopened this Oct 13, 2025

dcw02 changed the title ~~[Feature] Qwen3-VL-30B-A3B-Instruct eagle3 support~~ [Feature] Qwen3 VL eagle3 support Oct 20, 2025

dcw02 force-pushed the qwen3_vl_moe branch 2 times, most recently from 47ac5c9 to 1f08199 Compare October 31, 2025 00:40

dcw02 marked this pull request as ready for review October 31, 2025 00:47

dcw02 requested review from FlamingoPg, FrankLeeeee, shuaills, sleepcoo and zyksir as code owners October 31, 2025 00:47

dcw02 added 2 commits November 24, 2025 03:17

support qwen3_vl and qwen3_vl_moe EAGLE3 draft model training

ed66aa1

fix LlamaFlexAttention

dad1836

dcw02 force-pushed the qwen3_vl_moe branch from 346f592 to dad1836 Compare November 24, 2025 03:19

KerwinKai mentioned this pull request Nov 24, 2025

Add Eagle3 training for more MLLM model #302

Open

6 tasks

[Feature] Qwen3 VL eagle3 support #251

Are you sure you want to change the base?

[Feature] Qwen3 VL eagle3 support #251

Uh oh!

Conversation

dcw02 commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

dcw02 commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcw02 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevin19891229 commented Nov 10, 2025

Uh oh!

dcw02 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FrankLeeeee commented Nov 10, 2025

Uh oh!

dcw02 commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcw02 commented Nov 24, 2025

Uh oh!

FrankLeeeee commented Nov 24, 2025

Uh oh!

KerwinKai commented Nov 24, 2025

Uh oh!

ooolmk commented Nov 27, 2025

Uh oh!

dcw02 commented Nov 27, 2025

Uh oh!

ooolmk commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dcw02 commented Oct 10, 2025 •

edited

Loading

dcw02 commented Oct 13, 2025 •

edited

Loading

dcw02 commented Oct 31, 2025 •

edited

Loading

dcw02 commented Nov 10, 2025 •

edited

Loading

dcw02 commented Nov 12, 2025 •

edited

Loading