[Ops][Triton] Add a triton kernel supporting partial rope. #4413

whx-sjtu · 2025-11-24T13:47:09Z

What this PR does / why we need it?

This PR adds a triton rope kernel witch supports scenarios of rope_dim != head_dim. This can save the split op before rope and the concat op after rope. Profiling shows improvement.

Original Implementation(2 split+2 rope+ 2 slice +2 concat):

Because currently we only support piecewise aclgraph for DS 3.2, so there are plenty of free bubbles. You can see the computing time of all rope related kernels: 35us

New Triton Rope: 12us

Does this PR introduce any user-facing change?

None

How was this patch tested?

I will add related ut after ci integrated with triton.

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-11-24T13:47:19Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a Triton kernel for Rotary Positional Embedding (RoPE) to support partial RoPE, where the RoPE dimension is not equal to the head dimension. This is a valuable performance optimization as it avoids explicit split and concat operations. The implementation includes a new Triton kernel _triton_rope and a wrapper function rope_forward_triton. The changes in sfa_v1.py correctly use this new kernel when Triton is available, with a fallback to the existing implementation. The Triton kernel itself appears to be well-written, handling both NEOX and non-NEOX styles, and correctly deals with padding and masking for variable dimensions. The upcasting to float32 for intermediate computations is a good practice for maintaining precision. I have one comment regarding a docstring that could be improved for clarity. Overall, the changes are logical and well-structured.

vllm_ascend/ops/rotary_embedding.py

whx-sjtu · 2025-11-27T05:16:20Z

For prefill stage with larger bs, the performance gains remains substantial. Here is an example of bs=1001 with DS V3.2:

Without triton rope: 109us

With triton rope: 38us

Signed-off-by: whx-sjtu <[email protected]>

…ect#4413) ### What this PR does / why we need it? This PR adds a triton rope kernel witch supports scenarios of `rope_dim != head_dim`. This can save the split op before rope and the concat op after rope. Profiling shows improvement. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? I will add related ut after ci integrated with triton. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: whx-sjtu <[email protected]>

…ect#4413) ### What this PR does / why we need it? This PR adds a triton rope kernel witch supports scenarios of `rope_dim != head_dim`. This can save the split op before rope and the concat op after rope. Profiling shows improvement. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? I will add related ut after ci integrated with triton. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: whx-sjtu <[email protected]> Signed-off-by: Che Ruan <[email protected]>

…ect#4413) ### What this PR does / why we need it? This PR adds a triton rope kernel witch supports scenarios of `rope_dim != head_dim`. This can save the split op before rope and the concat op after rope. Profiling shows improvement. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? I will add related ut after ci integrated with triton. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: whx-sjtu <[email protected]>

github-actions bot added the module:ops label Nov 24, 2025

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

vllm_ascend/ops/rotary_embedding.py Outdated Show resolved Hide resolved

whx-sjtu force-pushed the partial_rope_triton branch from a3225c4 to 2d30141 Compare November 27, 2025 05:11

github-actions bot added the module:tests label Nov 28, 2025

whx-sjtu force-pushed the partial_rope_triton branch 3 times, most recently from b07e1a4 to 77c3431 Compare December 1, 2025 03:39

add triton kernel supporting partial rope

77c3431

Signed-off-by: whx-sjtu <[email protected]>

whx-sjtu added ready read for review ready-for-test start test by label for PR labels Dec 1, 2025

whx-sjtu force-pushed the partial_rope_triton branch 2 times, most recently from 816d029 to b0f7e4a Compare December 1, 2025 12:25

fix ci

b0f7e4a

Signed-off-by: whx-sjtu <[email protected]>

wangxiyuan approved these changes Dec 2, 2025

View reviewed changes

wangxiyuan merged commit 96b2cdf into vllm-project:main Dec 2, 2025
22 checks passed

whx-sjtu mentioned this pull request Dec 2, 2025

[BugFix][DS 3.2] Fix ds indexer accuracy problem caused by rope. #4641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Ops][Triton] Add a triton kernel supporting partial rope. #4413

[Ops][Triton] Add a triton kernel supporting partial rope. #4413

Uh oh!

whx-sjtu commented Nov 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

whx-sjtu commented Nov 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Ops][Triton] Add a triton kernel supporting partial rope. #4413

[Ops][Triton] Add a triton kernel supporting partial rope. #4413

Uh oh!

Conversation

whx-sjtu commented Nov 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

whx-sjtu commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

whx-sjtu commented Nov 24, 2025 •

edited by github-actions bot

Loading

whx-sjtu commented Nov 27, 2025 •

edited

Loading