【EPLB】Eplb Redundant Experts Bugfix #4232

shenchuxiaofugui · 2025-11-17T11:39:37Z

What this PR does / why we need it?

Redundant experts bugfix
The calculation logic for redundant experts has been fixed, allowing the correct number of redundant experts to be calculated using the map. Therefore, there is no longer a need to set the redundant expert parameter when passing the map.

Does this PR introduce any user-facing change?

After configuring the path for experts_map, users do not need to configure iinit_redundancy_expert.

How was this patch tested?

The accuracy of EPLB was tested with and without the use of redundant experts.

github-actions · 2025-11-17T11:39:45Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the expert load balancing (eplb) logic by removing the redundant global_redundant_expert_num parameter from determine_default_log2phy_map and its call sites. This change simplifies the code and centralizes the calculation of redundant experts. The modifications touch utility functions, core MoE layer initialization, and corresponding tests. While the refactoring is generally sound, there is a critical issue in one of the updated function calls that will cause a runtime error.

gemini-code-assist · 2025-11-17T11:40:59Z

vllm_ascend/ops/common_fused_moe.py

                self.log2phy = determine_default_log2phy_map(
                    self.global_num_experts, self.ep_size, self.ep_rank,
                    self.global_redundant_expert_num).npu()


This call to determine_default_log2phy_map includes self.global_redundant_expert_num as a fourth argument. However, the function signature for determine_default_log2phy_map in vllm_ascend/eplb/core/eplb_utils.py was changed in this PR to only accept three arguments. This will cause a TypeError at runtime. Please remove the extra argument to match the updated function definition.

self.log2phy = determine_default_log2phy_map( self.global_num_experts, self.ep_size, self.ep_rank).npu()

MengqingCao

plz update your pr message, and describe clearly the issue it fix and how this pr works on it

Signed-off-by: shenchuxiaofugui <[email protected]>

github-actions bot added module:tests module:ops labels Nov 17, 2025

gemini-code-assist bot reviewed Nov 17, 2025

View reviewed changes

github-actions bot added the module:quantization label Nov 18, 2025

shenchuxiaofugui changed the title ~~fix eplb redundant~~ 【EPLB】Eplb Redundant Experts Bugfix Nov 19, 2025

shenchuxiaofugui force-pushed the eplb_fix_dev branch 2 times, most recently from 22922dc to df87bab Compare November 21, 2025 01:50

github-actions bot added documentation Improvements or additions to documentation module:core labels Nov 26, 2025

shenchuxiaofugui force-pushed the eplb_fix_dev branch from 82fd03e to afef799 Compare November 26, 2025 03:58

github-actions bot removed documentation Improvements or additions to documentation module:core labels Nov 26, 2025

MengqingCao approved these changes Nov 28, 2025

View reviewed changes

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 1, 2025

shenchuxiaofugui added 12 commits December 3, 2025 09:29

fix eplb redundant

1da9969

Signed-off-by: shenchuxiaofugui <[email protected]>

fix

605ce4c

Signed-off-by: shenchuxiaofugui <[email protected]>

fix

ca1f234

Signed-off-by: shenchuxiaofugui <[email protected]>

fix

506cc85

Signed-off-by: shenchuxiaofugui <[email protected]>

CC

f45b3c2

Signed-off-by: shenchuxiaofugui <[email protected]>

CC

6c03d3f

Signed-off-by: shenchuxiaofugui <[email protected]>

fix

9cb7842

Signed-off-by: shenchuxiaofugui <[email protected]>

cc

f96842f

Signed-off-by: shenchuxiaofugui <[email protected]>

fix

ac80ae5

Signed-off-by: shenchuxiaofugui <[email protected]>

fix

1f3c61f

Signed-off-by: shenchuxiaofugui <[email protected]>

retry

26e93d8

Signed-off-by: shenchuxiaofugui <[email protected]>

fix

29337d5

Signed-off-by: shenchuxiaofugui <[email protected]>

shenchuxiaofugui force-pushed the eplb_fix_dev branch from e5635b2 to 29337d5 Compare December 3, 2025 01:36

wangxiyuan merged commit 593a960 into vllm-project:v0.11.0-dev Dec 3, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【EPLB】Eplb Redundant Experts Bugfix #4232

【EPLB】Eplb Redundant Experts Bugfix #4232

Uh oh!

shenchuxiaofugui commented Nov 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

MengqingCao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

【EPLB】Eplb Redundant Experts Bugfix #4232

【EPLB】Eplb Redundant Experts Bugfix #4232

Uh oh!

Conversation

shenchuxiaofugui commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shenchuxiaofugui commented Nov 17, 2025 •

edited

Loading