-
Notifications
You must be signed in to change notification settings - Fork 629
[Bugifx] fix quant_apply_mlp w1_scale type error & fix getting num_local_expert #4632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two important bug fixes. The first corrects how the number of local experts is determined in vllm_adaptor.py, making it more robust by using len() on what can be a list. The second fix addresses a type error in moe_mlp.py by correctly indexing into a list of tensors for a scale parameter. Both changes are correct and improve the stability of the code. I've added one suggestion to improve code readability in vllm_adaptor.py.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: 白永斌 <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 欧派果奶我还要 <[email protected]>
c2ba15d to
96f0c6e
Compare
…cal_expert (vllm-project#4632) ### What this PR does / why we need it? Fix bugs introduced by vllm-project@bc67696 1. fix getting num_local_experet error in vllm_adaptor 2. fix w1_scale type error in moe_mlp.quant_apply_mlp.npu_dequant_swiglu_quant in w4a8 quantized scenario - vLLM version: v0.12.0 --------- Signed-off-by: 白永斌 <[email protected]> Signed-off-by: 欧派果奶我还要 <[email protected]> Co-authored-by: 白永斌 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: wangxiyuan <[email protected]>
…cal_expert (vllm-project#4632) ### What this PR does / why we need it? Fix bugs introduced by vllm-project@bc67696 1. fix getting num_local_experet error in vllm_adaptor 2. fix w1_scale type error in moe_mlp.quant_apply_mlp.npu_dequant_swiglu_quant in w4a8 quantized scenario - vLLM version: v0.12.0 --------- Signed-off-by: 白永斌 <[email protected]> Signed-off-by: 欧派果奶我还要 <[email protected]> Co-authored-by: 白永斌 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: wangxiyuan <[email protected]>
…cal_expert (vllm-project#4632) ### What this PR does / why we need it? Fix bugs introduced by vllm-project@bc67696 1. fix getting num_local_experet error in vllm_adaptor 2. fix w1_scale type error in moe_mlp.quant_apply_mlp.npu_dequant_swiglu_quant in w4a8 quantized scenario - vLLM version: v0.12.0 --------- Signed-off-by: 白永斌 <[email protected]> Signed-off-by: 欧派果奶我还要 <[email protected]> Co-authored-by: 白永斌 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: wangxiyuan <[email protected]>
What this PR does / why we need it?
Fix bugs introduced by bc67696
Does this PR introduce any user-facing change?
How was this patch tested?