[MOE]move weight transpose to wakeup for RL secnarios #4626

lhp-deep · 2025-12-02T07:43:39Z

What this PR does / why we need it?

In reinforcement learning scenarios, the current inference applies a transpose operation to the weights. For a cleaner architecture, the weight transpose module was moved to wakeup.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

gemini-code-assist

Code Review

This pull request refactors the weight transposition logic for MoE models, moving it from the weight loading process into the wake_up method. This is intended to support Reinforcement Learning scenarios where weights are updated dynamically. The changes involve modifying how MoE weights are handled in fused_moe.py and worker_v1.py, and updating example and test files accordingly. My review identifies a critical bug in the weight identification logic within the wake_up method and suggests an improvement for correctness and code quality.

gemini-code-assist · 2025-12-02T07:45:42Z

vllm_ascend/worker/worker_v1.py

+        for name, param in model.named_parameters():
+            if 'w2_weight' in name and param.shape[2] == hidden_size:
+                parts = name.split('.')
+                param_name = parts[-1]
+                parent_module = model.get_submodule(".".join(parts[:-1]))
+
+                w2_data = param.transpose(1, 2)
+                w2_data = torch.nn.Parameter(w2_data, requires_grad=False)
+                setattr(parent_module, param_name, w2_data)
+            elif 'w13_weight' in name and param.shape[1] == hidden_size:
+                parts = name.split('.')
+                param_name = parts[-1]
+                parent_module = model.get_submodule(".".join(parts[:-1]))
+
+                w13_data = param.transpose(1, 2)
+                w13_data = torch.nn.Parameter(w13_data, requires_grad=False)
+                setattr(parent_module, param_name, w13_data)


There are a couple of issues in this block of code:

[Critical] The condition to identify the w2_weight parameter is incorrect. The shape of w2_weight at this point is (num_experts, hidden_size, intermediate_size). The condition param.shape[2] == hidden_size compares intermediate_size with hidden_size, which is not always true and will cause this logic to fail for many models. It should be param.shape[1] == hidden_size to correctly identify the parameter by its hidden dimension size.

[High] After transposing a tensor, it's good practice to call .contiguous() to ensure the memory layout is continuous. This can prevent potential errors and performance issues in subsequent operations that expect a contiguous tensor. The load_weights method which is called after this might rely on it.

[Medium] The code for transposing w2_weight and w13_weight is very similar. This duplication can be avoided by refactoring it into a helper function to improve readability and maintainability.

I've provided a suggestion that fixes the critical bug, adds .contiguous(), and refactors the duplicated logic.

Suggested change

for name, param in model.named_parameters():

if 'w2_weight' in name and param.shape[2] == hidden_size:

parts = name.split('.')

param_name = parts[-1]

parent_module = model.get_submodule(".".join(parts[:-1]))

w2_data = param.transpose(1, 2)

w2_data = torch.nn.Parameter(w2_data, requires_grad=False)

setattr(parent_module, param_name, w2_data)

elif 'w13_weight' in name and param.shape[1] == hidden_size:

parts = name.split('.')

param_name = parts[-1]

parent_module = model.get_submodule(".".join(parts[:-1]))

w13_data = param.transpose(1, 2)

w13_data = torch.nn.Parameter(w13_data, requires_grad=False)

setattr(parent_module, param_name, w13_data)

for name, param in model.named_parameters():

# The shape of w2_weight is (num_experts, hidden_size, intermediate_size)

# The shape of w13_weight is (num_experts, hidden_size, 2 * intermediate_size)

if ('w2_weight' in name or 'w13_weight' in name) and len(param.shape) == 3 and param.shape[1] == hidden_size:

parts = name.split('.')

param_name = parts[-1]

parent_module = model.get_submodule(".".join(parts[:-1]))

# Transpose back to training format and ensure contiguity

new_data = param.transpose(1, 2).contiguous()

new_param = torch.nn.Parameter(new_data, requires_grad=False)

setattr(parent_module, param_name, new_param)

github-actions · 2025-12-02T08:07:32Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: lhp-deep <[email protected]>

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

github-actions bot added module:tests module:ops labels Dec 2, 2025

lhp-deep force-pushed the lhp_wakeup_trans branch from 2c0ed50 to f157dab Compare December 2, 2025 12:56

wangxiyuan approved these changes Dec 3, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 3, 2025

[Refactor]move weight transpose to wakeup for RL secnarios

2c63755

Signed-off-by: lhp-deep <[email protected]>

lhp-deep force-pushed the lhp_wakeup_trans branch from 1a158d9 to 2c63755 Compare December 4, 2025 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MOE]move weight transpose to wakeup for RL secnarios #4626

[MOE]move weight transpose to wakeup for RL secnarios #4626

lhp-deep commented Dec 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[MOE]move weight transpose to wakeup for RL secnarios #4626

Are you sure you want to change the base?

[MOE]move weight transpose to wakeup for RL secnarios #4626

Conversation

lhp-deep commented Dec 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lhp-deep commented Dec 2, 2025 •

edited by github-actions bot

Loading