Skip to content

Conversation

@shenchuxiaofugui
Copy link
Contributor

@shenchuxiaofugui shenchuxiaofugui commented Dec 11, 2025

What this PR does / why we need it?

The fused alltoall operator itself was not designed or implemented to handle the scenario where tensors are lists, but the weights for dynamic load balancing are in list form.
Therefore, we have disabled this operator when using dynamic load balancing.

Does this PR introduce any user-facing change?

No

How was this patch tested?

After the repair, the service was restarted, and the conversation proceeded normally.
{"id":"","object":"chat.completion","created":,"model":"dsr1","choices":[{"index":0,"message":{"role":"assistant","content":"\nOkay, the user is asking "What is deep learning?" Hmm, this seems like a fundamental question about AI. They might be a complete beginner or someone with some tech background looking to","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":8,"total_tokens":48,"completion_tokens":40,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where the dynamic expert parallelism load balancer (dynamic_eplb) was incorrectly used with the FUSED_ALLTOALL MoE communication method. The FUSED_ALLTOALL method, which relies on the highly optimized dispatch_ffn_combine kernel, does not support the dynamic expert layout changes that dynamic_eplb introduces. The fix correctly disables FUSED_ALLTOALL when dynamic_eplb is enabled, falling back to the compatible ALLTOALL method. The change is implemented cleanly by introducing a fused_all2all_enable variable, which improves code readability. The fix is correct and necessary for the proper functioning of dynamic load balancing.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@shenchuxiaofugui shenchuxiaofugui force-pushed the ffn_fused branch 2 times, most recently from 884d60f to ce3074d Compare December 11, 2025 11:17
@MengqingCao MengqingCao added ready read for review ready-for-test start test by label for PR labels Dec 12, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants