[MOE]move weight transpose to wakeup for RL secnarios #4147

lhp-deep · 2025-11-12T07:32:06Z

What this PR does / why we need it?

In reinforcement learning scenarios, the current inference applies a transpose operation to the weights. For a cleaner architecture, the weight transpose module was moved to wakeup.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: lhp-deep <[email protected]>

github-actions · 2025-11-12T07:32:15Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to move the weight transposition logic for MoE layers to the wake_up function, enhancing support for Reinforcement Learning (RL) scenarios. The changes span an example file, the MoE operator implementation, and the worker logic. While the overall direction seems correct, I've identified a critical bug in the new code added to vllm_ascend/worker/worker_v1.py. The logic for retrieving a parameter's parent module is flawed and will lead to a runtime error. This needs to be addressed.

gemini-code-assist · 2025-11-12T07:34:16Z

vllm_ascend/worker/worker_v1.py

+                parent_module = model
+                parts = name.split('.')
+                param_name = parts[-1]
+                module_path = parts[-1]
+
+                for part in module_path:
+                    parent_module = getattr(parent_module, part)


The logic to retrieve the parent module of the w2_weight parameter is incorrect. module_path is assigned the parameter name string (parts[-1]), and the subsequent loop iterates over the characters of this string. This will cause getattr to fail at runtime.

You should iterate over the module path components to correctly traverse the model hierarchy. A cleaner way to achieve this is by using model.get_submodule().

Suggested change

parent_module = model

parts = name.split('.')

param_name = parts[-1]

module_path = parts[-1]

for part in module_path:

parent_module = getattr(parent_module, part)

parts = name.split('.')

param_name = parts[-1]

parent_module = model.get_submodule(".".join(parts[:-1]))

gemini-code-assist · 2025-11-12T07:34:16Z

vllm_ascend/worker/worker_v1.py

+                parent_module = model
+                parts = name.split('.')
+                param_name = parts[-1]
+                module_path = parts[-1]
+
+                for part in module_path:
+                    parent_module = getattr(parent_module, part)


Similar to the w2_weight block, the logic to retrieve the parent module for w13_weight is incorrect. module_path is assigned the parameter name string, and the loop iterates over its characters, which will cause a runtime error. You should use model.get_submodule() to correctly get the parent module.

Suggested change

parent_module = model

parts = name.split('.')

param_name = parts[-1]

module_path = parts[-1]

for part in module_path:

parent_module = getattr(parent_module, part)

parts = name.split('.')

param_name = parts[-1]

parent_module = model.get_submodule(".".join(parts[:-1]))

Signed-off-by: lhp-deep <[email protected]>

github-actions · 2025-11-24T12:38:09Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: lhp-deep <[email protected]>

…4604) To adapt to the vLLM v0.11.2 image, the method for obtaining PCP size and DCP size has been modified. ___ - vLLM version: v0.11.2 --------- Signed-off-by: SlightwindSec <[email protected]>

Signed-off-by: lhp-deep <[email protected]>

### What this PR does / why we need it? In `torchair_mla.py`, the `self.oproj` function includes an additional parameter `is_force_scatter`, while the `AscendRowParallelLinear` function in `linear.py` does not add this parameter. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: zzhx1 <[email protected]>

Ronald1995 · 2025-11-14T02:01:11Z

vllm_ascend/worker/worker_v1.py

+
+                w13_data = param.transpose(1, 2)
+                w13_data = torch.nn.Parameter(w13_data, requires_grad=False)
+                setattr(parent_module, param_name, w13_data)


duplicate code, please extract a method.

let' drop ascend scheduler test first to ensure all function works without it. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

### What this PR does / why we need it? Support shared expert DP for deepseek_mtp feature. `shared_expert_dp` requires `SP==True`, with corresponding parameter restrictions. Previously, due to the coupling between `shared_expert_dp` and torchair, and the removal of `deepseek_mtp` in vllm_ascend, shared expert dp of deepseek_mtp was temporarily removed. Currently, by performing the `reduce_scatter` on the input of deepssek_mtp in `mtp_proposer.py`, we ensure that it matches the dimensions of `input_embedding`, and then perform the `all_gather` on the output of mtp. ### How was this patch tested? baseline: <img width="1184" height="692" alt="image" src="https://github.com/user-attachments/assets/9680d53a-7b1d-481a-accc-b8f3dae2b9e3" /> enable shared_expert_dp and multistream_overlap_shared_expert: <img width="1167" height="687" alt="image" src="https://github.com/user-attachments/assets/2531d06b-dfda-4e24-8628-6f4b0f677ddc" /> TPOT: 48ms -> 45.4ms Average TPS per rank: 117.6 -> 126.1 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: chenmenglong <[email protected]> Signed-off-by: zengran <[email protected]> Co-authored-by: zengran <[email protected]>

### What this PR does / why we need it? Fix A3 QwenVL PD smoke test error ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? curl a requet to proxy port，it responses correctly - vLLM version: v0.11.2 Signed-off-by: 李少鹏 <[email protected]>

### What this PR does / why we need it? Add accuracy nightly test for new models： PaddlePaddle/ERNIE-4.5-21B-A3B-PT LLM-Research/Molmo-7B-D-0924 LLM-Research/gemma-2-9b-it LLM-Research/gemma-3-4b-it Shanghai_AI_Laboratory/internlm-7b llava-hf/llava-1.5-7b-hf - vLLM version: v0.11.2 Signed-off-by: hfadzxy <[email protected]>

### What this PR does / why we need it? Fix DeepSeek-V3.2-Exp doc, add docker command. - vLLM version: v0.11.2 Signed-off-by: menogrey <[email protected]>

### What this PR does / why we need it? Add GLM-4.5 nightly test - vLLM version: v0.11.2 Signed-off-by: hfadzxy <[email protected]>

The M4 quantization method in ModelSlim adds bias to model weights that originally do not have a linear bias. PR vllm-project#4235 supported PD-MIX quantization and M4 quantization, adding bias to `w8a8.py` and `w8a8_dynamic.py`, and implementing adaptations in `ops/linear.py` to prevent it from being reset to `None` by `self.register_parameter("bias", None)`. However, this modification introduced an issue where the bias was still being reset to `None` in certain scenarios, causing errors during service startup. Therefore, support for M4 quantization is temporarily being reverted in this PR. ___ - vLLM version: v0.11.2 Signed-off-by: SlightwindSec <[email protected]>

### What this PR does / why we need it? Fix the issue where the qwen3 moe service cannot be started due to upgrading the vllm version Error info: AttributeError: 'AscendFusedMoE' object has no attribute 'use dp chunking' ### Does this PR introduce _any_ user-facing change? no - vLLM version: v0.11.2 --------- Signed-off-by: Wang Kunpeng <[email protected]>

### What this PR does / why we need it? Following vllm-project#4349, remove Qwen3-VL modeling files. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Shanshan Shen <[email protected]>

set `enable_chunked_prefill` to True for e2e test by default to keep the same behavior with vLLM - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

…t#4531) [Refactor] Remove redundant attention operator branches. Reason： We replace other attention ops with fused_infer_attention_score expect decode_only state. clean code and remove 310P support. vllm-project#4455 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: weijinqian_v1 <[email protected]> Co-authored-by: weijinqian_v1 <[email protected]>

### What this PR does / why we need it? Fix Qwen2.5-Omni-7B accuarcy test issue：vllm-project#4480 Depends on : vllm-project#4534 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: hfadzxy <[email protected]>

) ### What this PR does / why we need it? Fix eplb enable when using mtp float weights. It will be remove when eplb supporting mtp and float weights. ### How was this patch tested? Deepseek-V3 + MTP + EPLB in A3. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: offline0806 <[email protected]> Signed-off-by: offline893 <[email protected]> Co-authored-by: offline0806 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: lhp-deep <[email protected]>

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.3.1 to 6.0.0. - vLLM version: v0.11.2 Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…" (vllm-project#4619) This reverts commit 71e9b37. It breaks vllm-ascend/Qwen3-30B-A3B-W8A8 test

Drop ascend scheduler from test - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

### What this PR does / why we need it? add hyperlink ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.2 --------- Signed-off-by: herizhen <[email protected]> Co-authored-by: herizhen <[email protected]>

clean up ascend scheduler config from doc - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

Signed-off-by: lhp-deep <[email protected]>

move weight transpose to wakeup for RL secnarios

69578eb

Signed-off-by: lhp-deep <[email protected]>

github-actions bot added the module:ops label Nov 12, 2025

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

lhp-deep added 5 commits November 12, 2025 16:02

modify the way to get the parent module

8f60d01

Signed-off-by: lhp-deep <[email protected]>

code allgnment

76d8a0c

Signed-off-by: lhp-deep <[email protected]>

code allgnment

f9c5f72

Signed-off-by: lhp-deep <[email protected]>

code allgment

3327622

Signed-off-by: lhp-deep <[email protected]>

fixbug:mock missing attr in NPUWorker wakeup test

82a8992

Signed-off-by: lhp-deep <[email protected]>

github-actions bot added the module:tests label Nov 13, 2025

fixbug:remove unused import in test_fused_moe.py

acabbea

Signed-off-by: lhp-deep <[email protected]>

lhp-deep changed the title ~~move weight transpose to wakeup for RL secnarios~~ [MOE]move weight transpose to wakeup for RL secnarios Nov 13, 2025

wangxiyuan approved these changes Nov 14, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Nov 14, 2025

github-actions bot added the merge-conflicts label Nov 24, 2025

lhp-deep added 2 commits December 1, 2025 17:26

[Test]add e2e test for test_offline_weight_load_weight.py

6f89b8f

Signed-off-by: lhp-deep <[email protected]>

Merge branch 'main' into lhp_wakeup_trans

6471631

Signed-off-by: lhp-deep <[email protected]>

github-actions bot removed the merge-conflicts label Dec 1, 2025

SlightwindSec and others added 3 commits December 1, 2025 19:20

[Bugfix] PCP adaptation for VLLM v0.11.2 modifications (vllm-project#…

aa56a0f

…4604) To adapt to the vLLM v0.11.2 image, the method for obtaining PCP size and DCP size has been modified. ___ - vLLM version: v0.11.2 --------- Signed-off-by: SlightwindSec <[email protected]>

[fixbug] add blank line

c7fa084

Signed-off-by: lhp-deep <[email protected]>

Ronald1995 reviewed Dec 1, 2025

View reviewed changes

wangxiyuan and others added 6 commits December 1, 2025 20:33

[CI] drop ascend scheduler test (vllm-project#4582)

27b09ca

let' drop ascend scheduler test first to ensure all function works without it. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

[Doc] Fix DeepSeek-V3.2-Exp doc, add docker command. (vllm-project#4479)

c097790

### What this PR does / why we need it? Fix DeepSeek-V3.2-Exp doc, add docker command. - vLLM version: v0.11.2 Signed-off-by: menogrey <[email protected]>

[Test] Add GLM-4.5 nightly test (vllm-project#4225)

8813832

### What this PR does / why we need it? Add GLM-4.5 nightly test - vLLM version: v0.11.2 Signed-off-by: hfadzxy <[email protected]>

SlightwindSec and others added 24 commits December 1, 2025 23:45

[CI]enable chunked prefill by default (vllm-project#4569)

981a14f

set `enable_chunked_prefill` to True for e2e test by default to keep the same behavior with vLLM - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

[fixbug] add errors=ignores

3eed49c

Signed-off-by: lhp-deep <[email protected]>

Revert "[Bugfix] Fix Qwen2.5-Omni-7B accuarcy test (vllm-project#4556)…

6360eb1

…" (vllm-project#4619) This reverts commit 71e9b37. It breaks vllm-ascend/Qwen3-30B-A3B-W8A8 test

[CI] Drop ascend scheduler from test (vllm-project#4613)

400af66

Drop ascend scheduler from test - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

[Doc]clean up ascend scheduler config from doc (vllm-project#4612)

cb33b09

clean up ascend scheduler config from doc - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <[email protected]>

move weight transpose to wakeup for RL secnarios

ca118e6

Signed-off-by: lhp-deep <[email protected]>

modify the way to get the parent module

4a67847

Signed-off-by: lhp-deep <[email protected]>

code allgnment

8008714

Signed-off-by: lhp-deep <[email protected]>

code allgnment

4697d45

Signed-off-by: lhp-deep <[email protected]>

code allgment

5b7d3d9

Signed-off-by: lhp-deep <[email protected]>

fixbug:mock missing attr in NPUWorker wakeup test

dbfc1f2

Signed-off-by: lhp-deep <[email protected]>

fixbug:remove unused import in test_fused_moe.py

048f418

Signed-off-by: lhp-deep <[email protected]>

[Test]add e2e test for test_offline_weight_load_weight.py

3c5667f

Signed-off-by: lhp-deep <[email protected]>

[fixbug] add blank line

8bcf8e7

Signed-off-by: lhp-deep <[email protected]>

[fixbug] add errors=ignores

cff6385

Signed-off-by: lhp-deep <[email protected]>

[fixbug] rebase main

58fcdde

Signed-off-by: lhp-deep <[email protected]>

github-actions bot added documentation Improvements or additions to documentation ci/build module:core module:quantization labels Dec 2, 2025

lhp-deep closed this Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MOE]move weight transpose to wakeup for RL secnarios #4147

[MOE]move weight transpose to wakeup for RL secnarios #4147

Uh oh!

lhp-deep commented Nov 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Uh oh!

gemini-code-assist bot Nov 12, 2025

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

Ronald1995 Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

[MOE]move weight transpose to wakeup for RL secnarios #4147

[MOE]move weight transpose to wakeup for RL secnarios #4147

Uh oh!

Conversation

lhp-deep commented Nov 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

Ronald1995 Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

lhp-deep commented Nov 12, 2025 •

edited by github-actions bot

Loading