Skip to content

Conversation

@dcw02
Copy link

@dcw02 dcw02 commented Oct 10, 2025

Motivation

** Draft PR. This is currently WIP **

Add eagle3 support for qwen3_vl and qwen3_vl_moe models.

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

@dcw02 dcw02 closed this Oct 13, 2025
@dcw02 dcw02 reopened this Oct 13, 2025
@dcw02
Copy link
Author

dcw02 commented Oct 13, 2025

Training finishes for tp size 1 and both sdpa/flex attention backends. Loss/acc curves look ok, going to add qwen3_vl_moe eagle3 support into sglang/vllm (so I can eval) before adding tp size > 1 support.

@dcw02 dcw02 changed the title [Feature] Qwen3-VL-30B-A3B-Instruct eagle3 support [Feature] Qwen3 VL eagle3 support Oct 20, 2025
@dcw02 dcw02 force-pushed the qwen3_vl_moe branch 2 times, most recently from 47ac5c9 to 1f08199 Compare October 31, 2025 00:40
@dcw02
Copy link
Author

dcw02 commented Oct 31, 2025

Marking this ready for review, adds qwen3_vl and qwen3_vl_moe EAGLE3 draft training support for tp size 1. SGLang support is being worked on/cleaned up.

Training graphs:

@dcw02 dcw02 marked this pull request as ready for review October 31, 2025 00:47
@kevin19891229
Copy link

When will verification be supported?

@dcw02
Copy link
Author

dcw02 commented Nov 10, 2025

When will verification be supported?

There's a branch of sglang here that you can run. It's currently being cleaned up for upstreaming. We were able to confirm an accept length of almost 5 for a spec config of 4, 6, 24 for one of our tasks for qwen3-vl-8b.

@FrankLeeeee
Copy link
Collaborator

Can you rebase this PR to the latest main?

@dcw02
Copy link
Author

dcw02 commented Nov 12, 2025

Can you rebase this PR to the latest main?

yes I'll get to it this week

@dcw02
Copy link
Author

dcw02 commented Nov 24, 2025

sorry for the delay @FrankLeeeee I finished the rebase

@FrankLeeeee
Copy link
Collaborator

@KerwinKai this PR seems to overlap with yours, can you take a look?

@KerwinKai
Copy link
Contributor

@KerwinKai this PR seems to overlap with yours, can you take a look?

Yes, I'll try.

@ooolmk
Copy link

ooolmk commented Nov 27, 2025

` elif (
args.is_vlm
and draft_model_config.target_model_type == "qwen3_vl"
and args.tp_size == 1
):
from transformers import Qwen3VLForConditionalGeneration

        target_model = (
            Qwen3VLForConditionalGeneration.from_pretrained(
                pretrained_model_name_or_path=args.target_model_path,
                dtype=torch.bfloat16,
            )
            .eval()
            .cuda()
        )`

Initialize the target model using Qwen3VLForConditionalGeneration from the Transformers library, but the class definition does not include set_aux_hidden_states_layers, causing the error:
AttributeError: 'Qwen3VLForConditionalGeneration' object has no attribute 'set_aux_hidden_states_layers'.

How should I modify Qwen3VLForConditionalGeneration? I noticed there is a class Eagle3TargetModel(ABC), but I’m not sure how to use it.

@dcw02
Copy link
Author

dcw02 commented Nov 27, 2025

` elif ( args.is_vlm and draft_model_config.target_model_type == "qwen3_vl" and args.tp_size == 1 ): from transformers import Qwen3VLForConditionalGeneration

        target_model = (
            Qwen3VLForConditionalGeneration.from_pretrained(
                pretrained_model_name_or_path=args.target_model_path,
                dtype=torch.bfloat16,
            )
            .eval()
            .cuda()
        )`

Initialize the target model using Qwen3VLForConditionalGeneration from the Transformers library, but the class definition does not include set_aux_hidden_states_layers, causing the error: AttributeError: 'Qwen3VLForConditionalGeneration' object has no attribute 'set_aux_hidden_states_layers'.

How should I modify Qwen3VLForConditionalGeneration? I noticed there is a class Eagle3TargetModel(ABC), but I’m not sure how to use it.

you should specify backend as hf rather than sglang backend

@ooolmk
Copy link

ooolmk commented Nov 28, 2025

Initialize the target model using Qwen3VLForConditionalGeneration from the Transformers library, but the class definition does not include set_aux_hidden_states_layers, causing the error: AttributeError: 'Qwen3VLForConditionalGeneration' object has no attribute 'set_aux_hidden_states_layers'.
How should I modify Qwen3VLForConditionalGeneration? I noticed there is a class Eagle3TargetModel(ABC), but I’m not sure how to use it.

you should specify backend as hf rather than sglang backend

Thanks a lot! But I finally resolved the 'set_aux_hidden_states_layers' issue by
target_model = HFEagle3TargetModel(target_model) and tune off all the fsdp & dp config
I successfully trained a qwen3-vl-2b-eagle3 model based on your commits. It can now perform inference on SLGANG. I'm preparing to switch to a larger dataset to further validate its performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants