support qwen3-next full_decode_only mode. #3949

momo609 · 2025-11-03T02:58:35Z

What this PR does / why we need it?

support qwen3-next full_decode_only mode.
bs=1, max_token=1024

branch	tps	e2e time
piecewise	3.06	8.15
fulldecodeonly	7.2	3.47

How was this patch tested?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@83f478b

github-actions · 2025-11-03T02:58:43Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for qwen3-next in full_decode_only mode by handling mixed attention types, specifically linear_attn. The changes in _build_dummy_attn_metadata correctly differentiate between attention builders to generate appropriate metadata for different layers. However, there is a block of redundant code that re-calculates attn_state, which should be removed to improve code clarity and maintainability.

gemini-code-assist · 2025-11-03T03:00:07Z

vllm_ascend/worker/model_runner_v1.py

+                attn_state = AscendAttentionState.DecodeOnly
+                if self.speculative_config and \
+                        self.speculative_config.method == "deepseek_mtp":
+                    attn_state = AscendAttentionState.SpecDecoding
+
+


This block of code is redundant as it re-calculates attn_state with the same logic as in lines 2720-2723. This can lead to confusion and potential maintenance issues. Please remove this duplicated block and use the attn_state variable that was already defined.

yiz-liu · 2025-11-03T08:12:17Z

@momo609 Please elaborate on why we do not need to update linear attention params in FULL mode and why zip naturally filters out those layer. Also, please add E2E test case.

Signed-off-by: wangxiaoxin-sherie <[email protected]>

### What this PR does / why we need it? support qwen3-next full_decode_only mode. bs=1, max_token=1024 | branch| tps| e2e time| | --- | --- | --- | |piecewise |3.06 | 8.15 | |fulldecodeonly | 7.2 | 3.47 | - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: wangxiaoxin-sherie <[email protected]> Co-authored-by: wangxiaoxin-sherie <[email protected]> Signed-off-by: Pz1116 <[email protected]>

### What this PR does / why we need it? support qwen3-next full_decode_only mode. bs=1, max_token=1024 | branch| tps| e2e time| | --- | --- | --- | |piecewise |3.06 | 8.15 | |fulldecodeonly | 7.2 | 3.47 | - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: wangxiaoxin-sherie <[email protected]> Co-authored-by: wangxiaoxin-sherie <[email protected]> Signed-off-by: luolun <[email protected]>

### What this PR does / why we need it? support qwen3-next full_decode_only mode. bs=1, max_token=1024 | branch| tps| e2e time| | --- | --- | --- | |piecewise |3.06 | 8.15 | |fulldecodeonly | 7.2 | 3.47 | - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: wangxiaoxin-sherie <[email protected]> Co-authored-by: wangxiaoxin-sherie <[email protected]> Signed-off-by: hwhaokun <[email protected]>

### What this PR does / why we need it? support qwen3-next full_decode_only mode. bs=1, max_token=1024 | branch| tps| e2e time| | --- | --- | --- | |piecewise |3.06 | 8.15 | |fulldecodeonly | 7.2 | 3.47 | - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: wangxiaoxin-sherie <[email protected]> Co-authored-by: wangxiaoxin-sherie <[email protected]> Signed-off-by: nsdie <[email protected]>

gemini-code-assist bot reviewed Nov 3, 2025

View reviewed changes

momo609 force-pushed the fulloptimze branch 3 times, most recently from c69a4f5 to 63db91a Compare November 3, 2025 07:28

github-actions bot added the module:tests label Nov 3, 2025

momo609 force-pushed the fulloptimze branch 5 times, most recently from 97bc2b0 to 1cce890 Compare November 4, 2025 06:16

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Nov 4, 2025

momo609 force-pushed the fulloptimze branch from e4819c4 to 3ac7f00 Compare November 4, 2025 06:56

QWEN3-NEXT support FULL_DECODE_ONLY mode.

7f07180

Signed-off-by: wangxiaoxin-sherie <[email protected]>

momo609 force-pushed the fulloptimze branch from df9f140 to 7f07180 Compare November 4, 2025 07:20

weijinqian0 approved these changes Nov 4, 2025

View reviewed changes

wangxiyuan approved these changes Nov 5, 2025

View reviewed changes

wangxiyuan merged commit 738bf2b into vllm-project:main Nov 5, 2025
24 checks passed

MengqingCao mentioned this pull request Nov 14, 2025

[0.11.0][HybridKV] Support KV sharing in mambaspec and fullattnspec #4210

Closed

yiz-liu mentioned this pull request Nov 24, 2025

[RFC]: Support DeepSeek-V3.2-Exp and Qwen3-Next with FULL_DECODE_ONLY mode #4387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support qwen3-next full_decode_only mode. #3949

support qwen3-next full_decode_only mode. #3949

Uh oh!

momo609 commented Nov 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 3, 2025

Uh oh!

yiz-liu commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

support qwen3-next full_decode_only mode. #3949

support qwen3-next full_decode_only mode. #3949

Uh oh!

Conversation

momo609 commented Nov 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

How was this patch tested?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

momo609 commented Nov 3, 2025 •

edited by github-actions bot

Loading