Support DeepSeekV3.2 with MLAPO operator #4753

ZYang6263 · 2025-12-05T13:30:37Z

What this PR does / why we need it?

This PR adds support for the optimized MLAPO operator in DSV3.2 and this operator provides an optimized implementation that avoids redundant q_down recomputation.
The operator implementation and optimizations were introduced in PR #4707.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-12-05T13:31:32Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for DeepSeekV3.2 with the MLAPO operator and introduces context parallelism for SFA (SFA CP). The changes are extensive, involving modifications to the attention mechanism, introduction of shared weight layers for memory optimization, and new distributed communication patterns.

My review has identified a few critical issues that will cause NameError exceptions at runtime due to variables being used before they are defined in all code paths. Specifically, cum_query_lens and seq_lens are used without being defined, and actual_seq_lengths_query and actual_seq_lengths_key are not initialized in all execution branches before being passed to a function. I have also found a logical error in a new utility function. Please address these issues to ensure the code is correct and robust.

gemini-code-assist · 2025-12-05T13:32:33Z

vllm_ascend/attention/sfa_v1.py

+                actual_seq_lengths_query = torch.empty_like(cum_query_lens) 
+                actual_seq_lengths_key = torch.empty_like(seq_lens)


The variables cum_query_lens and seq_lens are used here but are not defined within this scope. They should be accessed from attn_metadata, e.g., attn_metadata.cum_query_lens. This also applies to their usages on lines 775, 780, and 792 within this if self.enable_sfa_cp: block. This will cause a NameError at runtime.

Suggested change

actual_seq_lengths_query = torch.empty_like(cum_query_lens)

actual_seq_lengths_key = torch.empty_like(seq_lens)

actual_seq_lengths_query = torch.empty_like(attn_metadata.cum_query_lens)

actual_seq_lengths_key = torch.empty_like(attn_metadata.seq_lens)

gemini-code-assist · 2025-12-05T13:32:33Z

vllm_ascend/attention/sfa_v1.py

+        topk_indices = self.indexer_select(
+            x=hidden_states,
+            qr=q_c,
+            kv_cache=kv_cache,
+            attn_metadata=attn_metadata,
+            cos=cos,
+            sin=sin,
+            actual_seq_lengths_query=actual_seq_lengths_query,
+            actual_seq_lengths_key=actual_seq_lengths_key,
+            need_gather_q_kv=need_gather_q_kv)


The call to self.indexer_select requires actual_seq_lengths_query and actual_seq_lengths_key as arguments. However, these variables are only defined when self.enable_sfa_cp is true. They are not defined for the else case, nor for the if self.enable_mlapo and not forward_context.with_prefill: path. This will lead to a NameError.

You should define these variables in all code paths leading to this call. For the paths where they are not defined, they should likely be initialized to attn_metadata.cum_query_lens and attn_metadata.seq_lens respectively.

gemini-code-assist · 2025-12-05T13:32:33Z

vllm_ascend/distributed/sfa_sp_context.py

+def check_diff(a: torch.Tensor, b: torch.Tensor) -> Any:
+    if torch.equal(a, b):
+        absolute = torch.abs(a - b)
+        relative = torch.abs(a - b) / (torch.abs(a) + 1e-9)
+        return (torch.max(absolute).item(), torch.max(relative).item())
+    return False


The logic in this function seems inverted. If torch.equal(a, b) is true, it proceeds to calculate the difference (which will be zero) and returns (0.0, 0.0). If they are not equal, it returns False. A function named check_diff would typically return False or (0.0, 0.0) to indicate no difference, and return the difference metrics when the tensors are not equal.

Suggested change

def check_diff(a: torch.Tensor, b: torch.Tensor) -> Any:

if torch.equal(a, b):

absolute = torch.abs(a - b)

relative = torch.abs(a - b) / (torch.abs(a) + 1e-9)

return (torch.max(absolute).item(), torch.max(relative).item())

return False

def check_diff(a: torch.Tensor, b: torch.Tensor) -> Any:

if torch.equal(a, b):

return False

absolute = torch.abs(a - b)

relative = torch.abs(a - b) / (torch.abs(a) + 1e-9)

return (torch.max(absolute).item(), torch.max(relative).item())

github-actions · 2025-12-06T11:53:00Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-12-06T13:30:38Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: ZYang6263 <[email protected]> [Feat]enable sfa cp for dsv3.2 (vllm-project#4702) RFC: vllm-project/vllm#30055 1. enable flashcommon1 export VLLM_ASCEND_ENABLE_FLASHCOMM1=1 2. enable sfa-cp --additional-config '{ "enable_sfa_cp": true }' \ - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Co-authored-by: Yizhou Liu <[email protected]> Signed-off-by: ZYang6263 <[email protected]>

github-actions bot added module:ops module:core labels Dec 5, 2025

gemini-code-assist bot reviewed Dec 5, 2025

View reviewed changes

ZYang6263 force-pushed the pr-mlapo branch 2 times, most recently from 4408d64 to a6c12be Compare December 5, 2025 14:04

github-actions bot added the merge-conflicts label Dec 6, 2025

github-actions bot removed module:ops merge-conflicts labels Dec 6, 2025

ZYang6263 force-pushed the pr-mlapo branch 2 times, most recently from b97166d to 22b945f Compare December 6, 2025 12:43

github-actions bot removed the module:core label Dec 6, 2025

ZYang6263 force-pushed the pr-mlapo branch 3 times, most recently from 2ef7739 to 283116c Compare December 6, 2025 13:22

github-actions bot added module:ops module:core merge-conflicts labels Dec 6, 2025

ZYang6263 force-pushed the pr-mlapo branch 2 times, most recently from a362648 to da2f3e4 Compare December 6, 2025 13:56

github-actions bot added merge-conflicts and removed merge-conflicts module:ops labels Dec 6, 2025

ZYang6263 force-pushed the pr-mlapo branch from da2f3e4 to 3bc2319 Compare December 6, 2025 14:20

github-actions bot removed the merge-conflicts label Dec 6, 2025

ZYang6263 force-pushed the pr-mlapo branch from 3bc2319 to b23d368 Compare December 6, 2025 14:21

github-actions bot removed the module:core label Dec 6, 2025

ZYang6263 force-pushed the pr-mlapo branch from b23d368 to 2d4d473 Compare December 6, 2025 14:24

ZYang6263 force-pushed the pr-mlapo branch 6 times, most recently from 22e364e to 3b46f19 Compare December 6, 2025 15:26

ZYang6263 force-pushed the pr-mlapo branch from 3b46f19 to ec8d897 Compare December 6, 2025 15:42

ZYang6263 marked this pull request as ready for review December 7, 2025 04:15

wangxiyuan approved these changes Dec 7, 2025

View reviewed changes

wangxiyuan merged commit b91a5f0 into vllm-project:main Dec 7, 2025
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support DeepSeekV3.2 with MLAPO operator #4753

Support DeepSeekV3.2 with MLAPO operator #4753

ZYang6263 commented Dec 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		actual_seq_lengths_query = torch.empty_like(cum_query_lens)
		actual_seq_lengths_key = torch.empty_like(seq_lens)

Support DeepSeekV3.2 with MLAPO operator #4753

Support DeepSeekV3.2 with MLAPO operator #4753

Conversation

ZYang6263 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZYang6263 commented Dec 5, 2025 •

edited

Loading