[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev #3753

whx-sjtu · 2025-10-25T06:29:48Z

This PR moves the communication operation of shared experts out of extra stream because I found that this might cause rtMemcpy related errors when running shared experts multistream with aclgraph.

Furthermore, I utilize a global variable as extra stream object to avoid allocating streams for each layer in full-graph mode.

github-actions · 2025-10-25T06:30:00Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the multi-stream handling for Mixture-of-Experts (MoE) layers, specifically for shared experts. The changes involve moving communication operations for shared experts out of the extra computation stream to prevent potential errors with aclgraph. Additionally, it introduces a global stream for shared expert calculations to avoid re-creating streams for each layer, which is a good optimization for graph mode. The implementation looks correct and aligns with the stated goals. I have one piece of feedback regarding a misleading comment in a new utility function, which should be corrected for clarity.

vllm_ascend/utils.py

…llm-project#3582) This PR moves the communication operation of shared experts out of extra stream because I found that this might cause rtMemcpy related errors when running shared experts multistream with aclgraph. Furthermore, I utilize a global variable as extra stream object to avoid allocating streams for each layer in full-graph mode. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: whx-sjtu <[email protected]>

…3753) This PR moves the communication operation of shared experts out of extra stream because I found that this might cause rtMemcpy related errors when running shared experts multistream with aclgraph. Furthermore, I utilize a global variable as extra stream object to avoid allocating streams for each layer in full-graph mode. Signed-off-by: whx-sjtu <[email protected]>

github-actions bot added module:tests module:ops module:core labels Oct 25, 2025

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

vllm_ascend/utils.py Show resolved Hide resolved

whx-sjtu force-pushed the opt_ms_moe_11 branch from 76123d7 to d4cff75 Compare October 25, 2025 06:33

whx-sjtu added the ready read for review label Oct 25, 2025

wangxiyuan approved these changes Oct 25, 2025

View reviewed changes

yiz-liu merged commit a58ff9e into vllm-project:v0.11.0-dev Oct 25, 2025
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev #3753

[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev #3753

Uh oh!

whx-sjtu commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev #3753

[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev #3753

Uh oh!

Conversation

whx-sjtu commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants