[feature] Enable EPLB to support MTP layers #4425

BryanChen408 · 2025-11-25T07:03:09Z

What this PR does / why we need it?

This PR enhances EPLB to support one or multiple MTP layers. Previously, EPLB only supported the main model. Now, it can handle num_speculative_tokens=1 or num_speculative_tokens > 1.

Does this PR introduce any user-facing change?

No, this PR does not introduce any user-facing changes.

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-11-25T07:03:18Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for MTP layers in EPLB. The changes are mostly correct, but there are several critical issues related to potential None value access and incorrect logic when MTP is not used or when multiple MTP layers are present. These issues could lead to crashes or incorrect behavior. I've provided suggestions to fix these problems.

gemini-code-assist · 2025-11-25T07:06:11Z

vllm_ascend/eplb/adaptor/vllm_adaptor.py

+        # TODO: init self.mtp_expert_weight_names depending on different model types, only deepseek v3 w8a8 and qwen3-moe is supported here
+        if any("w13_weight_offset" in name for name, _ in self.mtp_instance.named_parameters()):
+            self.mtp_expert_weight_names = [
+                "w13_weight", "w2_weight", "w13_weight_scale",
+                "w13_weight_offset", "w2_weight_scale", "w2_weight_offset"
+            ]
+        else:
+            self.mtp_expert_weight_names = ["w13_weight", "w2_weight"]
+


The code accesses self.mtp_instance.named_parameters() without checking if self.mtp_instance is None. This will cause an AttributeError when mtp_instance is not provided during initialization. The block should be guarded with a check for self.mtp_instance.

Suggested change

# TODO: init self.mtp_expert_weight_names depending on different model types, only deepseek v3 w8a8 and qwen3-moe is supported here

if any("w13_weight_offset" in name for name, _ in self.mtp_instance.named_parameters()):

self.mtp_expert_weight_names = [

"w13_weight", "w2_weight", "w13_weight_scale",

"w13_weight_offset", "w2_weight_scale", "w2_weight_offset"

]

else:

self.mtp_expert_weight_names = ["w13_weight", "w2_weight"]

# TODO: init self.mtp_expert_weight_names depending on different model types, only deepseek v3 w8a8 and qwen3-moe is supported here

if self.mtp_instance is not None:

if any("w13_weight_offset" in name for name, _ in self.mtp_instance.named_parameters()):

self.mtp_expert_weight_names = [

"w13_weight", "w2_weight", "w13_weight_scale",

"w13_weight_offset", "w2_weight_scale", "w2_weight_offset"

]

else:

self.mtp_expert_weight_names = ["w13_weight", "w2_weight"]

else:

self.mtp_expert_weight_names = []

gemini-code-assist · 2025-11-25T07:06:12Z

vllm_ascend/eplb/adaptor/vllm_adaptor.py

+        if self.mtp_instance is not None:
+            mtp_param_dict = dict(self.mtp_instance.named_parameters())
+            self.expert_param_per_layer[self.num_dense_layers + self.num_moe_layers] = list()
+            for local_expert_id in range(num_local_expert):
+                for mtp_layer_idx in range(self.num_mtp_layers):
+                    self.expert_param_per_layer[self.num_dense_layers + self.num_moe_layers + mtp_layer_idx].append([
+                        mtp_param_dict["model.layers." + str(self.num_dense_layers + self.num_moe_layers + mtp_layer_idx) +
+                                        ".mtp_block.mlp.experts." +
+                                        name].data[local_expert_id]
+                        for name in self.mtp_expert_weight_names
+                    ])


The initialization of self.expert_param_per_layer for MTP layers is incorrect. It only initializes a list for the first MTP layer. If num_mtp_layers > 1, this will raise a KeyError when trying to access subsequent layers. The initialization should be done for all MTP layers.

Suggested change

if self.mtp_instance is not None:

mtp_param_dict = dict(self.mtp_instance.named_parameters())

self.expert_param_per_layer[self.num_dense_layers + self.num_moe_layers] = list()

for local_expert_id in range(num_local_expert):

for mtp_layer_idx in range(self.num_mtp_layers):

self.expert_param_per_layer[self.num_dense_layers + self.num_moe_layers + mtp_layer_idx].append([

mtp_param_dict["model.layers." + str(self.num_dense_layers + self.num_moe_layers + mtp_layer_idx) +

".mtp_block.mlp.experts." +

name].data[local_expert_id]

for name in self.mtp_expert_weight_names

])

if self.mtp_instance is not None:

mtp_param_dict = dict(self.mtp_instance.named_parameters())

for mtp_layer_idx in range(self.num_mtp_layers):

self.expert_param_per_layer[self.num_dense_layers + self.num_moe_layers + mtp_layer_idx] = list()

for local_expert_id in range(num_local_expert):

for mtp_layer_idx in range(self.num_mtp_layers):

self.expert_param_per_layer[self.num_dense_layers + self.num_moe_layers + mtp_layer_idx].append([

mtp_param_dict["model.layers." + str(self.num_dense_layers + self.num_moe_layers + mtp_layer_idx) +

".mtp_block.mlp.experts." +

name].data[local_expert_id]

for name in self.mtp_expert_weight_names

])

gemini-code-assist · 2025-11-25T07:06:12Z

vllm_ascend/eplb/eplb_updator.py

                    self.expert_map_record_path)

            self.adaptor.model.clear_all_moe_loads()
+            self.adaptor.mtp_instance.model.clear_all_moe_loads()


The code accesses self.adaptor.mtp_instance without checking if it is None. This will cause an AttributeError if no MTP instance is used. This call should be guarded with if self.adaptor.mtp_instance is not None:.

Suggested change

self.adaptor.mtp_instance.model.clear_all_moe_loads()

if self.adaptor.mtp_instance is not None:

self.adaptor.mtp_instance.model.clear_all_moe_loads()

gemini-code-assist · 2025-11-25T07:06:12Z

vllm_ascend/eplb/utils.py

+        return self.layers[str(layer_id)].mtp_block.mlp.experts.get_log2phy_map()


 def get_all_expert_map(self, num_moe_layers):


The function get_all_expert_map is defined to take num_moe_layers as a required argument. However, it is called without arguments for MTP models, which will cause a TypeError. The num_moe_layers argument is not used for MTP models, so it should be made optional.

Suggested change

def get_all_expert_map(self, num_moe_layers):

def get_all_expert_map(self, num_moe_layers=None):

gemini-code-assist · 2025-11-25T07:06:12Z

vllm_ascend/worker/model_runner_v1.py

+            self.eplb_adaptor = VllmEplbAdaptor(
+                model=self.model, 
+                mtp_instance=mtp_instance, 
+                num_mtp_layers=mtp_instance.model.num_mtp_layers
+                )
            self.eplb_loader.set_adator(self.eplb_adaptor)
-            self.eplb_updator.set_adaptor(self.eplb_adaptor)
+            self.eplb_updator.set_adaptor(self.eplb_adaptor, mtp_instance.model.num_mtp_layers)


The code accesses mtp_instance.model without checking if mtp_instance is None. This will raise an AttributeError when speculative decoding with deepseek_mtp is not used. You should conditionally get num_mtp_layers and pass it to the adaptor and updator.

Suggested change

self.eplb_adaptor = VllmEplbAdaptor(

model=self.model,

mtp_instance=mtp_instance,

num_mtp_layers=mtp_instance.model.num_mtp_layers

)

self.eplb_loader.set_adator(self.eplb_adaptor)

self.eplb_updator.set_adaptor(self.eplb_adaptor)

self.eplb_updator.set_adaptor(self.eplb_adaptor, mtp_instance.model.num_mtp_layers)

num_mtp_layers = mtp_instance.model.num_mtp_layers if mtp_instance is not None else 0

self.eplb_adaptor = VllmEplbAdaptor(

model=self.model,

mtp_instance=mtp_instance,

num_mtp_layers=num_mtp_layers

)

self.eplb_loader.set_adator(self.eplb_adaptor)

self.eplb_updator.set_adaptor(self.eplb_adaptor, num_mtp_layers)

gemini-code-assist · 2025-11-25T07:06:12Z

vllm_ascend/worker/model_runner_v1.py

+                if self.speculative_config and self.speculative_config.method == 'deepseek_mtp':
+                    assert isinstance(self.drafter, MtpProposer) and isinstance(self.drafter.model, DeepSeekMTP)
+                    mtp_instance=self.drafter.model
+                model_register(mtp_instance.model, self.vllm_config)    


The variable mtp_instance is only defined within the if block, but it is used outside of it in the model_register call. This will lead to a NameError if self.speculative_config.method is not 'deepseek_mtp'. The model_register call should be moved inside the if block.

Suggested change

if self.speculative_config and self.speculative_config.method == 'deepseek_mtp':

assert isinstance(self.drafter, MtpProposer) and isinstance(self.drafter.model, DeepSeekMTP)

mtp_instance=self.drafter.model

model_register(mtp_instance.model, self.vllm_config)

if self.speculative_config and self.speculative_config.method == 'deepseek_mtp':

assert isinstance(self.drafter, MtpProposer) and isinstance(self.drafter.model, DeepSeekMTP)

mtp_instance=self.drafter.model

model_register(mtp_instance.model, self.vllm_config)

- Implement EPLB supporting MTP layer - Add support for multiple MTP layers configuration - Enhance handling of num_speculative_tokens parameter: - Support num_speculative_tokens = 1 (single token speculative inference) - Support num_speculative_tokens > 1 (multiple tokens speculative inference) Signed-off-by: chenbaiuan <[email protected]>

github-actions · 2025-11-29T06:59:16Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: chenbaixuan <[email protected]>

github-actions · 2025-11-29T07:29:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

BryanChen408 force-pushed the feature/eplb+mtp3 branch 3 times, most recently from 3e46e82 to 9c2f938 Compare November 26, 2025 03:51

BryanChen408 force-pushed the feature/eplb+mtp3 branch from 134ccc3 to 962cb9b Compare November 29, 2025 06:59

github-actions bot added the merge-conflicts label Nov 29, 2025

BryanChen408 force-pushed the feature/eplb+mtp3 branch from 962cb9b to 1c96296 Compare November 29, 2025 07:06

github-actions bot removed the merge-conflicts label Nov 29, 2025

qucik fix for mtp in full graph mode

e61c779

Signed-off-by: chenbaixuan <[email protected]>

BryanChen408 force-pushed the feature/eplb+mtp3 branch from 1c96296 to e61c779 Compare November 29, 2025 07:13

github-actions bot added the merge-conflicts label Nov 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature] Enable EPLB to support MTP layers #4425

[feature] Enable EPLB to support MTP layers #4425

BryanChen408 commented Nov 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	self.adaptor.mtp_instance.model.clear_all_moe_loads()
	if self.adaptor.mtp_instance is not None:
	self.adaptor.mtp_instance.model.clear_all_moe_loads()

		return self.layers[str(layer_id)].mtp_block.mlp.experts.get_log2phy_map()


		def get_all_expert_map(self, num_moe_layers):

[feature] Enable EPLB to support MTP layers #4425

Are you sure you want to change the base?

[feature] Enable EPLB to support MTP layers #4425

Conversation

BryanChen408 commented Nov 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BryanChen408 commented Nov 25, 2025 •

edited by github-actions bot

Loading