[Refactor] Remove redundant attention operator branches. #4455

weijinqian0 · 2025-11-26T06:21:49Z

[Refactor] Remove redundant attention operator branches.

Reason：

We replace other attention ops with fused_infer_attention_score expect decode_only state.
clean code and remove 310P support.

Signed-off-by: weijinqian_v1 <[email protected]>

github-actions · 2025-11-26T06:21:58Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the attention operator branches by unifying prefill logic into a single _forward_prefill method and removing support for 310P devices. The changes simplify the control flow in the main forward method, making the code cleaner and easier to maintain. My review focuses on the correctness and clarity of this refactoring. I've identified one high-severity issue related to an unused parameter in the new _forward_prefill method, which should be addressed to improve code quality.

gemini-code-assist · 2025-11-26T06:23:40Z

vllm_ascend/attention/attention_v1.py

+    def _forward_prefill(self,
+                             query: torch.Tensor,
+                             key: torch.Tensor,
+                             value: torch.Tensor,
+                             kv_cache: Tuple[torch.Tensor],
+                             attn_metadata: AscendMetadata,
+                             output: torch.Tensor,
+                             num_tokens=0):


The output parameter in the _forward_prefill method is unused. The value passed to it is immediately shadowed by the assignment output, _ = torch_npu.npu_fused_infer_attention_score(...) on line 357. This is misleading as it suggests an in-place operation which is not happening, and the pre-allocated tensor is wasted.

To avoid confusion and make the code cleaner, the output parameter should be removed from the function signature. The call site at line 592 should also be updated to no longer pass this parameter.

def _forward_prefill(self, query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, kv_cache: Tuple[torch.Tensor], attn_metadata: AscendMetadata, num_tokens=0):

Signed-off-by: weijinqian_v1 <[email protected]>

vllm_ascend/attention/attention_v1.py

[Refactor] Remove redundant attention operator branches. Reason： We replace other attention ops with fused_infer_attention_score expect decode_only state. clean code and remove 310P support. #4455 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: weijinqian_v1 <[email protected]> Co-authored-by: weijinqian_v1 <[email protected]>

wangxiyuan · 2025-12-02T11:44:13Z

replace by #4524

…t#4531) [Refactor] Remove redundant attention operator branches. Reason： We replace other attention ops with fused_infer_attention_score expect decode_only state. clean code and remove 310P support. vllm-project#4455 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: weijinqian_v1 <[email protected]> Co-authored-by: weijinqian_v1 <[email protected]>

…t#4531) [Refactor] Remove redundant attention operator branches. Reason： We replace other attention ops with fused_infer_attention_score expect decode_only state. clean code and remove 310P support. vllm-project#4455 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: weijinqian_v1 <[email protected]> Co-authored-by: weijinqian_v1 <[email protected]> Signed-off-by: Che Ruan <[email protected]>

…t#4531) [Refactor] Remove redundant attention operator branches. Reason： We replace other attention ops with fused_infer_attention_score expect decode_only state. clean code and remove 310P support. vllm-project#4455 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: weijinqian_v1 <[email protected]> Co-authored-by: weijinqian_v1 <[email protected]>

weijinqian_v1 added 2 commits November 26, 2025 11:47

[Refactor] add fia_v3 attention & remove other attention operator.

1661cf8

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

fe0e2c5

Signed-off-by: weijinqian_v1 <[email protected]>

gemini-code-assist bot reviewed Nov 26, 2025

View reviewed changes

weijinqian_v1 added 8 commits November 26, 2025 14:32

[Refactor] add fia_v3 attention & remove other attention operator.

72d8a90

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

11dc6bf

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

10e29d5

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

366c359

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

0250679

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

a919aef

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

5445fd7

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

8bd9477

Signed-off-by: weijinqian_v1 <[email protected]>

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Nov 27, 2025

weijinqian_v1 added 5 commits November 27, 2025 17:10

[Refactor] add fia_v3 attention & remove other attention operator.

af59fa2

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

994d6d8

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

7d5b507

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

959a630

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

56330f9

Signed-off-by: weijinqian_v1 <[email protected]>

zhenwenqi2024 reviewed Nov 27, 2025

View reviewed changes

vllm_ascend/attention/attention_v1.py Show resolved Hide resolved

vllm_ascend/attention/attention_v1.py Show resolved Hide resolved

weijinqian0 mentioned this pull request Dec 1, 2025

[Refactor] Remove redundant attention operator branches. #4531

Merged

wangxiyuan mentioned this pull request Dec 2, 2025

prefillnocache branch use fia op. #4396

Closed

wangxiyuan closed this Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Remove redundant attention operator branches. #4455

[Refactor] Remove redundant attention operator branches. #4455

Uh oh!

weijinqian0 commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

wangxiyuan commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Refactor] Remove redundant attention operator branches. #4455

[Refactor] Remove redundant attention operator branches. #4455

Uh oh!

Conversation

weijinqian0 commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wangxiyuan commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants