[Refactor] Remove redundant attention operator branches. #4531

weijinqian0 · 2025-11-28T05:18:22Z

[Refactor] Remove redundant attention operator branches.

Reason：

We replace other attention ops with fused_infer_attention_score expect decode_only state.
clean code and remove 310P support.

#4455

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: weijinqian_v1 <[email protected]>

github-actions · 2025-11-28T05:18:30Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the attention mechanism by removing redundant operator branches and support for the 310P device. The changes unify different prefill attention paths into a single _forward_prefill method using npu_fused_infer_attention_score, which simplifies the codebase significantly. The _forward_v1_style method has been removed, and a new _forward_encode method has been introduced to handle encoder-only attention, making the main forward method cleaner and more readable. The logic for creating attention masks has also been simplified. Overall, the changes are well-executed, improve code maintainability, and appear to be correct. I have no major concerns.

Signed-off-by: weijinqian_v1 <[email protected]>

github-actions · 2025-11-29T01:24:34Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <[email protected]>

…pgrade. Signed-off-by: weijinqian_v1 <[email protected]>

Signed-off-by: weijinqian_v1 <[email protected]>

momo609 · 2025-12-01T09:15:14Z

vllm_ascend/worker/model_runner_v1.py

-            return None
+        return None

    def _make_fia_attention_mask(self) -> torch.Tensor:


fia_mask can also delete.

It's used in pcp branch. It will be removed next PR.

Signed-off-by: weijinqian_v1 <[email protected]>

…t#4531) [Refactor] Remove redundant attention operator branches. Reason： We replace other attention ops with fused_infer_attention_score expect decode_only state. clean code and remove 310P support. vllm-project#4455 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: weijinqian_v1 <[email protected]> Co-authored-by: weijinqian_v1 <[email protected]>

…t#4531) [Refactor] Remove redundant attention operator branches. Reason： We replace other attention ops with fused_infer_attention_score expect decode_only state. clean code and remove 310P support. vllm-project#4455 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: weijinqian_v1 <[email protected]> Co-authored-by: weijinqian_v1 <[email protected]> Signed-off-by: Che Ruan <[email protected]>

zhangxinyuehfad · 2025-12-04T10:59:40Z

@weijinqian0 #4713 gemma-2-9b-it & gemma-3-4b-it accuarcy test failed after the pr

weijinqian_v1 added 2 commits November 28, 2025 13:15

[Refactor] add fia_v3 attention & remove other attention operator.

96fc1b9

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

79d3e88

Signed-off-by: weijinqian_v1 <[email protected]>

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

weijinqian_v1 added 4 commits November 28, 2025 14:03

[Refactor] add fia_v3 attention & remove other attention operator.

449fec8

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

cc71f2e

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

13b246b

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

92702bc

Signed-off-by: weijinqian_v1 <[email protected]>

github-actions bot added the module:tests label Nov 28, 2025

weijinqian_v1 added 3 commits November 28, 2025 16:44

[Refactor] add fia_v3 attention & remove other attention operator.

6c2b832

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

02dd06c

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] add fia_v3 attention & remove other attention operator.

0708877

Signed-off-by: weijinqian_v1 <[email protected]>

github-actions bot added the merge-conflicts label Nov 29, 2025

weijinqian_v1 added 2 commits November 29, 2025 09:25

[Refactor] add fia_v3 attention & remove other attention operator.

212b804

Signed-off-by: weijinqian_v1 <[email protected]>

[bugfix] Repair the problem of moe model accuracy caused by version u…

d7c3b21

…pgrade. Signed-off-by: weijinqian_v1 <[email protected]>

github-actions bot removed the merge-conflicts label Dec 1, 2025

weijinqian_v1 added 2 commits December 1, 2025 14:58

[Refactor] extract attention cp.

4eeba42

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] extract attention cp.

1f6c9b5

Signed-off-by: weijinqian_v1 <[email protected]>

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 1, 2025

wangxiyuan approved these changes Dec 1, 2025

View reviewed changes

momo609 reviewed Dec 1, 2025

View reviewed changes

weijinqian_v1 added 5 commits December 1, 2025 18:15

[Refactor] extract attention cp.

0c27d12

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] extract attention cp.

9eb007e

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] extract attention cp.

174e83e

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] extract attention cp.

a80b755

Signed-off-by: weijinqian_v1 <[email protected]>

[Refactor] extract attention cp.

e0045e3

Signed-off-by: weijinqian_v1 <[email protected]>

weijinqian0 removed ready read for review ready-for-test start test by label for PR labels Dec 1, 2025

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 1, 2025

[Refactor] extract attention cp.

6b1540a

Signed-off-by: weijinqian_v1 <[email protected]>

weijinqian0 merged commit b4bf01e into vllm-project:main Dec 2, 2025
23 of 33 checks passed

This was referenced Dec 2, 2025

[Refactor] 1/N Refactor attention_v1 & extract attention_cp #4628

Open

[RFC]: Refactor Attention module #4629

Open

zhangxinyuehfad mentioned this pull request Dec 4, 2025

[Bug]: gemma-2-9b-it and gemma-3-4b-it accuracy test failed #4713

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Remove redundant attention operator branches. #4531

[Refactor] Remove redundant attention operator branches. #4531

weijinqian0 commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

momo609 Dec 1, 2025

Uh oh!

weijinqian0 Dec 1, 2025

Uh oh!

Uh oh!

zhangxinyuehfad commented Dec 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Refactor] Remove redundant attention operator branches. #4531

[Refactor] Remove redundant attention operator branches. #4531

Conversation

weijinqian0 commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

momo609 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

weijinqian0 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhangxinyuehfad commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

weijinqian0 commented Nov 28, 2025 •

edited by github-actions bot

Loading

zhangxinyuehfad commented Dec 4, 2025 •

edited

Loading