[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic #29873

shen-shanshan · 2025-12-02T12:17:04Z

Purpose

In some modeling files, there are direct calling of apply_rotary_emb function by using pre-computed cos/sin cache, like: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/qwen2_5_vl.py#L383-L385. This is just a function, not an operation, and cannot be overwritten by some plugins (e.g., vllm-ascend). By extracting it as an CustomOp, we can just extend this class and implement our forward_oot() function, after which we register it to easily replace this op with other op of OOT device.
There are several dispatch function for apply_rotary_emb which may make users or developers confused, such as https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rotary_embedding/common.py#L56-L70 and https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rotary_embedding/common.py#L73-L97. This PR has unified these separate dispatch logic into one CustomOp to make it clearer.
There are also some redundant definitions of some functions, such as rotate_half() and apply_rotary_emb_torch(). This PR has removed these replicated definitions in different modeling files and just remained one replica in https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rotary_embedding/common.py.

Test Plan

I have tested this PR on Ascend NPU together with vllm-project/vllm-ascend#4667.

Run:

vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \
--max_model_len 16384 \
--max-num-batched-tokens 16384 \
--tensor-parallel-size 2 \
--enforce-eager

Test Result

Output:

{"id":"chatcmpl-9ab4de23690c85aa","object":"chat.completion","created":1764748509,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the image reads \"TONGYI Qwen.\" The word \"TONGYI\" is written in blue, and \"Qwen\" is written in gray. The font appears to be modern and clean, with \"TONGYI\" being slightly larger than \"Qwen.\" The design includes a geometric, abstract shape on the left side of the logo, which complements the text.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":78,"total_tokens":162,"completion_tokens":84,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/rotary_embedding/common.py

gemini-code-assist

Code Review

The pull request successfully refactors the apply_rotary_emb functionality into a CustomOp class, unifying dispatch logic and removing redundant definitions across several files. This is a positive step towards better modularity and extensibility. However, there are a few critical issues that need to be addressed to ensure correctness and prevent runtime errors.

vllm/model_executor/layers/rotary_embedding/common.py

ProExpertProg

Why is this necessary - there's already a RotaryEmbedding custom op class?

shen-shanshan · 2025-12-03T03:27:23Z

Why is this necessary - there's already a RotaryEmbedding custom op class?

@ProExpertProg

In some modeling files, there are direct calling of apply_rotary_emb function by using pre-computed cos/sin cache, like: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/qwen2_5_vl.py#L383-L385. This is just a function, not an operation, and cannot be overwritten by some plugins (e.g., vllm-ascend). By extracting it as an CustomOp, we can just extend this class and implement our forward_oot() function, after which we register it to easily replace this op with other op of OOT device.
There are several dispatch function for apply_rotary_emb which may make users or developers confused, such as https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rotary_embedding/common.py#L56-L70 and https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rotary_embedding/common.py#L73-L97. This PR has unified these separate dispatch logic into one CustomOp to make it clearer.
There are also some redundant definitions of some functions, such as rotate_half() and apply_rotary_emb_torch(). This PR has removed these replicated definitions in different modeling files and just remained one replica in https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rotary_embedding/common.py.

shen-shanshan · 2025-12-03T08:43:37Z

Why is this necessary - there's already a RotaryEmbedding custom op class?

I have also tested this PR together with vllm-project/vllm-ascend#4667 on Ascend NPU.
Maybe we need to add a ready label to see whether this PR will break other backend, such as Cuda, Rocm, ...

robertgshaw2-redhat · 2025-12-04T15:27:49Z

@ProExpertProg - mind following up on this?

ProExpertProg

Nice dispatch cleanup, approving so that it's not blocked while I'm gone for the next 2 weeks but please address comments!!

ProExpertProg · 2025-12-05T23:32:12Z

vllm/model_executor/layers/rotary_embedding/base.py

        query_rot = query[..., :rotary_dim]
        query_pass = query[..., rotary_dim:]
-        query_rot = apply_rotary_emb_torch(query_rot, cos, sin, is_neox_style)
+        query_rot = ApplyRotaryEmb.forward_static(


use self.apply_rotary_emb?

I suppose self.apply_rotary_emb can not be called in this static method.

vllm/model_executor/layers/rotary_embedding/common.py

ProExpertProg · 2025-12-05T23:34:07Z

vllm/model_executor/layers/rotary_embedding/common.py

+    )
+
+
+def _apply_rotary_emb_torch(


Can this just live in ApplyRotaryEmb.forward_static?

ProExpertProg · 2025-12-05T23:35:48Z

vllm/model_executor/layers/rotary_embedding/common.py

+        # If torch compile is not enabled, use rotary embedding function from
+        # flash_attn package, otherwise use the naive pytorch embedding
+        # implementation is faster when torch compile is enabled.
+        if not torch.compiler.is_compiling():


For follow-up: can we change this so the dispatch happens inside __init__ (check compilation_config.mode != CompilationMode.NONE)?

Actually, we should just always use FA in forward_hip as forward_native is used by default anyway when torch Inductor is used

Yeah, you are right. When using graph mode and backend = inductor, CustomOp will be disabled by default, and forward_hip won't be called in this case.

shen-shanshan · 2025-12-06T02:26:09Z

Nice dispatch cleanup, approving so that it's not blocked while I'm gone for the next 2 weeks but please address comments!!

Really thanks for your review. I will address the comments and fix CI errors recently.

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan requested a review from sighingnow as a code owner December 2, 2025 12:17

mergify bot added the qwen Related to Qwen models label Dec 2, 2025

shen-shanshan mentioned this pull request Dec 2, 2025

[RFC]: Remove VL Modeling Files vllm-project/vllm-ascend#4084

Open

15 tasks

chatgpt-codex-connector bot reviewed Dec 2, 2025

View reviewed changes

vllm/model_executor/layers/rotary_embedding/common.py Show resolved Hide resolved

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

vllm/model_executor/layers/rotary_embedding/common.py Show resolved Hide resolved

vllm/model_executor/layers/rotary_embedding/common.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/rotary_embedding/common.py Outdated Show resolved Hide resolved

ProExpertProg requested changes Dec 2, 2025

View reviewed changes

shen-shanshan mentioned this pull request Dec 3, 2025

[CustomOp] Implement ApplyRotaryEmb CustomOp and register it vllm-project/vllm-ascend#4667

Open

shen-shanshan requested a review from ProExpertProg December 4, 2025 01:28

ProExpertProg approved these changes Dec 5, 2025

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2025

shen-shanshan force-pushed the rope branch from 5e90669 to fcd32ad Compare December 8, 2025 08:35

shen-shanshan added 5 commits December 9, 2025 06:45

extract apply_rotary_emb as custom op

25853f9

Signed-off-by: shen-shanshan <[email protected]>

fix

05ee4b1

Signed-off-by: shen-shanshan <[email protected]>

address comments

9316f9a

Signed-off-by: shen-shanshan <[email protected]>

update

cbc8790

Signed-off-by: shen-shanshan <[email protected]>

update

15797ae

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan force-pushed the rope branch from c18f8a6 to 15797ae Compare December 9, 2025 07:19

shen-shanshan added 3 commits December 9, 2025 09:28

fix shape

8a1db3e

Signed-off-by: shen-shanshan <[email protected]>

update

b6ff390

Signed-off-by: shen-shanshan <[email protected]>

update

f25fdb6

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan changed the title ~~[CustomOp] Extract apply_rotary_emb as CustomOp and unify the dispatch logic~~ [CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic Dec 9, 2025

		)


		def _apply_rotary_emb_torch(

Uh oh!

[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic #29873

Are you sure you want to change the base?

[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic #29873

Conversation

shen-shanshan commented Dec 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

shen-shanshan commented Dec 3, 2025

Uh oh!

shen-shanshan commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertgshaw2-redhat commented Dec 4, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ProExpertProg Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shen-shanshan commented Dec 2, 2025 •

edited by github-actions bot

Loading

shen-shanshan commented Dec 3, 2025 •

edited

Loading

shen-shanshan Dec 8, 2025 •

edited

Loading

shen-shanshan commented Dec 6, 2025 •

edited

Loading