[Fusion] [Graph] Add qknorm rope fusion operator #4711

wxsIcey · 2025-12-04T09:41:55Z

What this PR does / why we need it?

This PR add qkv_rmsnorm_rope operator and introduces a graph fusion pass for qknorm_rope operations. The implementation includes a new configuration flag, a pattern matching pass using torch._inductor.pattern_matcher, and a custom Triton kernel for the fused operation.

Co-authored-by: Angazenn [email protected]

Does this PR introduce any user-facing change?

Yes, add new additional_config

How was this patch tested?

todo

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist

Code Review

This pull request introduces a graph fusion pass for qknorm_rope operations on Ascend hardware, which is a great step for performance optimization. The implementation includes a new configuration flag, a pattern matching pass using torch._inductor.pattern_matcher, and a custom Triton kernel for the fused operation. The code is well-structured, but I've identified several areas for improvement regarding code quality, robustness, and maintainability. My review comments focus on removing debug artifacts, improving code clarity and consistency, enhancing robustness by avoiding hardcoded values and unsafe module-level initializations, and addressing significant code duplication.

vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py

vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py

vllm_ascend/compilation/passes/qkvnorm_rope_fusion_pass.py

vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py

github-actions · 2025-12-04T10:04:32Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Angazenn · 2025-12-08T10:25:28Z

vllm_ascend/worker/model_runner_v1.py

                                   dtype=self.dtype,
                                   device=self.device)
+        # For GQA models.
+        elif not self.vllm_config.model_config.use_mla:


We should be more careful on this condition. As far as I know, GQA models does not always has rope_dim of 128, and this hardcode might cause some potential bugs. Perhaps we can limit it to qwen3_moe only？

weijinqian0 · 2025-12-08T10:40:47Z

vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py

+    return q_output, k_output, v_output
+
+
+direct_register_custom_op(op_name="qkv_rmsnorm_rope",


use import torch_npu._inductor

The pattern_matcher method of inductor does not support the triton operator. It does support torch.ops.aten (aten operator), torch.ops.npu (custom operator), and torch.add (PyTorch API). Therefore, it is wrapped as a custom op.

whx-sjtu · 2025-12-08T10:44:38Z

vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py

+    return driver.active.utils.get_device_properties(device)
+
+
+num_vectorcore = get_npu_properties()["num_vectorcore"]


this parameter has already been defined in triton/utils.py

Thanks. I have modified it.

github-actions · 2025-12-10T01:49:48Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wxsIcey · 2025-12-11T03:34:14Z

This pr rely on #4409, because ci has no triton.

github-actions · 2025-12-11T03:52:28Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

momo609 · 2025-12-11T07:05:46Z

vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py

+
+            return q_rope, k_rope, v
+
+        def replacement(qkv: torch.Tensor, q_weight: torch.Tensor,


pattern in 'if xxx else: torch.ops.vllm.qkv_rmsnorm_rope ’ need support in future releases

We don't perform any special checks in the pattern. You can add a new pattern match.

github-actions · 2025-12-11T09:58:17Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: wxsIcey <[email protected]>

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

github-actions bot added module:ops module:core labels Dec 4, 2025

wxsIcey changed the title ~~[Fusion] [Graph] Add qknorm rope fusion~~ [Fusion] [Graph] Add qknorm rope fusion operator Dec 5, 2025

github-actions bot added the module:tests label Dec 7, 2025

Angazenn reviewed Dec 8, 2025

View reviewed changes

wangxiyuan approved these changes Dec 8, 2025

View reviewed changes

weijinqian0 reviewed Dec 8, 2025

View reviewed changes

whx-sjtu suggested changes Dec 8, 2025

View reviewed changes

wxsIcey marked this pull request as ready for review December 9, 2025 02:04

github-actions bot added merge-conflicts and removed module:tests labels Dec 9, 2025

wxsIcey force-pushed the qknorm_rope_fusion branch from bac8b40 to 59f15a7 Compare December 11, 2025 02:18

github-actions bot removed the merge-conflicts label Dec 11, 2025

wxsIcey added ready read for review ready-for-test start test by label for PR labels Dec 11, 2025

wxsIcey requested a review from whx-sjtu December 11, 2025 03:02

github-actions bot added the merge-conflicts label Dec 11, 2025

wxsIcey force-pushed the qknorm_rope_fusion branch from 5a890c9 to 98a4d21 Compare December 11, 2025 04:54

github-actions bot removed the merge-conflicts label Dec 11, 2025

momo609 reviewed Dec 11, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 11, 2025

wxsIcey force-pushed the qknorm_rope_fusion branch from 3d8050a to 95718dd Compare December 12, 2025 03:46

github-actions bot added merge-conflicts and removed merge-conflicts labels Dec 12, 2025

wxsIcey added 28 commits December 15, 2025 02:30

adapt bias is not none

74a1f22

Signed-off-by: wxsIcey <[email protected]>

change to norm rope fusion

2298418

Signed-off-by: wxsIcey <[email protected]>

tiny fix

adcb956

Signed-off-by: wxsIcey <[email protected]>

tiny fix

53ec910

Signed-off-by: wxsIcey <[email protected]>

normalize fusion naming and format code

fdbe072

Signed-off-by: wxsIcey <[email protected]>

move special operator to attention metadata builder

fbe96a1

Signed-off-by: wxsIcey <[email protected]>

add e2e test

b4e33b1

Signed-off-by: wxsIcey <[email protected]>

remove first layer change

f2a99fe

Signed-off-by: wxsIcey <[email protected]>

tiny fix

9d976e8

Signed-off-by: wxsIcey <[email protected]>

fix

797e73e

Signed-off-by: wxsIcey <[email protected]>

fix

480310d

Signed-off-by: wxsIcey <[email protected]>

fix

bfad30d

Signed-off-by: wxsIcey <[email protected]>

remove e2e test

16ed120

Signed-off-by: wxsIcey <[email protected]>

tiny fix

e634427

Signed-off-by: wxsIcey <[email protected]>

fix

6746e52

Signed-off-by: wxsIcey <[email protected]>

fix eagle spec decode

974a718

Signed-off-by: wxsIcey <[email protected]>

tiny fix

de734bb

Signed-off-by: wxsIcey <[email protected]>

fix triton

8ad7a98

Signed-off-by: wxsIcey <[email protected]>

fix

73e7260

Signed-off-by: wxsIcey <[email protected]>

install triton

2f28673

Signed-off-by: wxsIcey <[email protected]>

fix ut

c951a58

Signed-off-by: wxsIcey <[email protected]>

fix

512ce4a

Signed-off-by: wxsIcey <[email protected]>

fix ut

31497e2

Signed-off-by: wxsIcey <[email protected]>

fix ut

3d5fe0b

Signed-off-by: wxsIcey <[email protected]>

resolve conflict

cc9f76e

Signed-off-by: wxsIcey <[email protected]>

change workflow

3e1c74c

Signed-off-by: wxsIcey <[email protected]>

tiny fix

fff044b

Signed-off-by: wxsIcey <[email protected]>

fix

ec6b0df

Signed-off-by: wxsIcey <[email protected]>

wxsIcey force-pushed the qknorm_rope_fusion branch from bb324de to ec6b0df Compare December 15, 2025 02:31

fix

6e0bf8a

Signed-off-by: wxsIcey <[email protected]>

		return q_output, k_output, v_output


		direct_register_custom_op(op_name="qkv_rmsnorm_rope",

		return driver.active.utils.get_device_properties(device)


		num_vectorcore = get_npu_properties()["num_vectorcore"]


		return q_rope, k_rope, v

		def replacement(qkv: torch.Tensor, q_weight: torch.Tensor,

[Fusion] [Graph] Add qknorm rope fusion operator #4711

Are you sure you want to change the base?

[Fusion] [Graph] Add qknorm rope fusion operator #4711

Conversation

wxsIcey commented Dec 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Angazenn Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

weijinqian0 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

wxsIcey Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

wxsIcey Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

wxsIcey commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

momo609 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

wxsIcey Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wxsIcey commented Dec 4, 2025 •

edited by github-actions bot

Loading