[Feat] Flashcomm2 use o_shared linear #4188

zzhx1 · 2025-11-13T19:20:22Z

What this PR does / why we need it?

It is mentioned in the flashcomm2 technical report that FC2 will introduce full redundant storage of the o_proj matrix, which will put pressure on the memory. Therefore, the technical report proposed a compromise solution using otp2, but it will introduce additional reduce-scatter communication.

We propose a shared linear feature (#2931 ) that supports distributing weights layer by layer to each card, avoiding the need for TP splitting, and can solve the memory issue.

This PR depends on #3232 and #2931

Flashcomm2 flowchart

Does this PR introduce any user-facing change?

Use environment variables

export VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1
export VLLM_ASCEND_ENABLE_FLASHCOMM2_OSHARED=1

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-11-13T19:20:30Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces support for shared o_proj linear layers for Flashcomm2, which involves changes across configuration, distributed state management, and the attention mechanism. The core logic for shared weights is implemented in vllm_ascend/torchair/ops/shared_weight_layer.py, which has been refactored for better usability.

My review focuses on ensuring the correctness and robustness of the new feature. I've identified a few critical issues:

Incorrect validation logic for the new flashcomm2_oproj_shared configuration that could lead to silent failures.
A potential crash in the shared weight layer logic when handling a series with a single layer.

I have provided suggestions to fix these issues. The rest of the changes look good and the refactoring of the shared weight layer API is a nice improvement.

gemini-code-assist · 2025-11-13T19:22:51Z

vllm_ascend/ascend_config.py

+            if self.flashcomm2_oproj_tensor_parallel_size is None:
+                raise AssertionError(
+                    "flashcomm2_oproj_shared must be enabled simultaneously with flashcomm2_oproj_tensor_parallel_size"
+                )


The validation if self.flashcomm2_oproj_tensor_parallel_size is None: is incorrect. The value of self.flashcomm2_oproj_tensor_parallel_size is an integer returned from get_flashcomm2_config_and_validate (which gets it from an environment variable with a default of 0), so it will never be None. The check should be against 0, as flashcomm2_oproj_shared requires flashcomm2_oproj_tensor_parallel_size to be greater than 0.

Suggested change

if self.flashcomm2_oproj_tensor_parallel_size is None:

raise AssertionError(

"flashcomm2_oproj_shared must be enabled simultaneously with flashcomm2_oproj_tensor_parallel_size"

)

if self.flashcomm2_oproj_tensor_parallel_size == 0:

raise AssertionError(

"flashcomm2_oproj_shared must be enabled with flashcomm2_oproj_tensor_parallel_size > 0"

)

gemini-code-assist · 2025-11-13T19:22:51Z

vllm_ascend/torchair/ops/shared_weight_layer.py

+        self.layers.sort(key=lambda x: x.layer_idx)
+        self.num_layers = len(self.layers)
+        assert self.num_layers > 0, "No layers in the series"
+        assert self.prefetch_step >= 0 and self.prefetch_step <= self.num_layers - 2, "prefetch_step must be in [0, num_layers - 2]"


The assertion self.prefetch_step <= self.num_layers - 2 will cause a crash if a shared weight series contains only one layer (self.num_layers == 1), because self.num_layers - 2 would be -1. For a single-layer series, prefetching is not applicable, and prefetch_step should be 0. To prevent this crash, the assertion should be adjusted to handle this edge case.

Suggested change

assert self.prefetch_step >= 0 and self.prefetch_step <= self.num_layers - 2, "prefetch_step must be in [0, num_layers - 2]"

assert self.prefetch_step >= 0 and self.prefetch_step <= max(0, self.num_layers - 2), "prefetch_step must be in [0, num_layers - 2]"

gemini-code-assist · 2025-11-13T19:22:52Z

vllm_ascend/utils.py

+    if flashcomm2_oproj_shared:
+        if flashcomm2_oproj_tp_size is None:
+            raise AssertionError(
+                "flashcomm2_oproj_shared must be enabled simultaneously with flashcomm2_oproj_tensor_parallel_size"
+            )
+        logger.info("Enable Flashcomm2 with flashcomm2_oproj_shared")


This validation logic for flashcomm2_oproj_shared is redundant with the logic in vllm_ascend/ascend_config.py. It's better to have validation in one place to avoid inconsistencies. Since ascend_config.py is the configuration entry point, it's a better place for this check. Additionally, the check if flashcomm2_oproj_tp_size is None: is incorrect, as flashcomm2_oproj_tp_size is an integer. I've suggested a fix in ascend_config.py and recommend removing this redundant block.

vllm_ascend/distributed/parallel_state.py

github-actions · 2025-12-01T11:06:30Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

vllm_ascend/distributed/parallel_state.py

vllm_ascend/attention/mla_v1.py

zzhx1 · 2025-12-04T09:38:14Z

@wangxiyuan this PR is ready, Please check again, and if there are no issues, please help merge it in.

Levi-JQ · 2025-12-04T13:11:37Z

vllm_ascend/attention/mla_v1.py

            kv_no_split.contiguous(), need_gather_q_kv)

+        if self.fc2_enable and is_hidden_layer(self.vllm_config, self.o_proj):
+            reach_layer_for_shared_weight_series(self.o_proj)


Why is the first broadcast performed here? I think it's not general enough because other models in the profilerun phase are not aware of the information related to the Opoj layer. Should the first broadcast be performed after post_process_after_loading_for_shared_weight_series instead?

That's already included in post_process_after_loading_for_shared_weight_series. See https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/torchair/ops/shared_weight_layer.py#L73

OK ,understand. So why we do broadcast in profile run if the first broadcast already included in post_process_after_loading_for_shared_weight_series ?

To handle the multi DP cases. When some DP are running dummy_run, they should also broadcast their weights to those DP executing model.

github-actions · 2025-12-05T02:44:27Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

tests/ut/attention/test_mla_v1.py

vllm_ascend/distributed/parallel_state.py

vllm_ascend/envs.py

wangxiyuan · 2025-12-06T04:26:15Z

vllm_ascend/ops/shared_weight_layer.py

 from typing import Callable, Optional

 import torch
 import torch.distributed as dist


this is not a ops, we should consider to move to a better place

github-actions · 2025-12-06T11:50:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zzhx1 <[email protected]> Co-authored-by: clrs97 <[email protected]> Co-authored-by: Levi-JQ <[email protected]>

Signed-off-by: zzhx1 <[email protected]>

zzhx1 · 2025-12-07T08:43:10Z

@wangxiyuan this PR is ready, please help merge it in.

github-actions bot added the module:core label Nov 13, 2025

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

zzhx1 force-pushed the flashcomm_oshared branch from a951ad1 to da0d630 Compare November 14, 2025 05:05

zzhx1 changed the title ~~Flashcomm2 use o_shared linear~~ [Feat] Flashcomm2 use o_shared linear Nov 14, 2025

github-actions bot added the module:tests label Nov 14, 2025

zzhx1 force-pushed the flashcomm_oshared branch 3 times, most recently from 47501e5 to 8bf8ed5 Compare November 14, 2025 09:14

github-actions bot added module:tests and removed module:tests labels Nov 14, 2025

zzhx1 force-pushed the flashcomm_oshared branch 2 times, most recently from faaa68e to bfbda42 Compare November 17, 2025 05:08

zzhx1 force-pushed the flashcomm_oshared branch from 60f1aac to 89c3923 Compare November 24, 2025 07:12

whx-sjtu reviewed Nov 25, 2025

View reviewed changes

vllm_ascend/distributed/parallel_state.py Outdated Show resolved Hide resolved

github-actions bot added the module:ops label Nov 26, 2025

zzhx1 force-pushed the flashcomm_oshared branch 3 times, most recently from 18803f9 to a9fae57 Compare December 1, 2025 08:28

github-actions bot added the merge-conflicts label Dec 1, 2025

zzhx1 force-pushed the flashcomm_oshared branch from a9fae57 to 6ca00ba Compare December 1, 2025 12:03

github-actions bot removed the merge-conflicts label Dec 1, 2025

zzhx1 force-pushed the flashcomm_oshared branch from 26672ae to ce92a65 Compare December 1, 2025 14:10

Levi-JQ reviewed Dec 1, 2025

View reviewed changes

vllm_ascend/distributed/parallel_state.py Outdated Show resolved Hide resolved

clrs97 reviewed Dec 2, 2025

View reviewed changes

vllm_ascend/attention/mla_v1.py Show resolved Hide resolved

zzhx1 force-pushed the flashcomm_oshared branch 4 times, most recently from fd7c9fa to ba1a760 Compare December 4, 2025 07:40

jianzs added ready read for review ready-for-test start test by label for PR labels Dec 4, 2025

Levi-JQ reviewed Dec 4, 2025

View reviewed changes

Levi-JQ mentioned this pull request Dec 4, 2025

[Feat] flashcomm2+oshard Generalized #4723

Open

github-actions bot added the merge-conflicts label Dec 5, 2025

zzhx1 force-pushed the flashcomm_oshared branch from ba1a760 to 73407a7 Compare December 5, 2025 06:02

github-actions bot removed the merge-conflicts label Dec 5, 2025

MengqingCao reviewed Dec 5, 2025

View reviewed changes

tests/ut/attention/test_mla_v1.py Outdated Show resolved Hide resolved

Levi-JQ reviewed Dec 5, 2025

View reviewed changes

vllm_ascend/distributed/parallel_state.py Outdated Show resolved Hide resolved

zzhx1 force-pushed the flashcomm_oshared branch 6 times, most recently from 8404356 to 5f7c45c Compare December 6, 2025 03:51

wangxiyuan approved these changes Dec 6, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 6, 2025

[FEAT] oproj shared linear support flashcomm2

de6d64f

Signed-off-by: zzhx1 <[email protected]> Co-authored-by: clrs97 <[email protected]> Co-authored-by: Levi-JQ <[email protected]>

zzhx1 force-pushed the flashcomm_oshared branch from 5f7c45c to de6d64f Compare December 6, 2025 15:29

github-actions bot removed the merge-conflicts label Dec 6, 2025

replace bool

8da4e8b

Signed-off-by: zzhx1 <[email protected]>

zzhx1 force-pushed the flashcomm_oshared branch 2 times, most recently from 2427a23 to 0835a03 Compare December 6, 2025 17:39

fix bug

3a1bf01

Signed-off-by: zzhx1 <[email protected]>

zzhx1 force-pushed the flashcomm_oshared branch from 0835a03 to 3a1bf01 Compare December 6, 2025 18:04

	assert self.prefetch_step >= 0 and self.prefetch_step <= self.num_layers - 2, "prefetch_step must be in [0, num_layers - 2]"
	assert self.prefetch_step >= 0 and self.prefetch_step <= max(0, self.num_layers - 2), "prefetch_step must be in [0, num_layers - 2]"

[Feat] Flashcomm2 use o_shared linear #4188

Are you sure you want to change the base?

[Feat] Flashcomm2 use o_shared linear #4188

Conversation

zzhx1 commented Nov 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Flashcomm2 flowchart

Does this PR introduce any user-facing change?

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

zzhx1 commented Dec 4, 2025

Uh oh!

Levi-JQ Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

clrs97 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Levi-JQ Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

clrs97 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangxiyuan Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

zzhx1 commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zzhx1 commented Nov 13, 2025 •

edited by github-actions bot

Loading

wangxiyuan Dec 6, 2025 •

edited

Loading