-
Notifications
You must be signed in to change notification settings - Fork 624
[main][bugfix] bugfix for qwen3 moe quantization #4599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Wang Kunpeng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the Qwen3 MoE quantization by correctly initializing the base class for AscendFusedMoEMethod. The change ensures that the FusedMoEMethodBase constructor is called with the appropriate moe_config from the layer.
| quant_config: AscendQuantConfig, | ||
| prefix: str, | ||
| packed_modules_mapping: Dict[str, Any], | ||
| layer: torch.nn.Module = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layer parameter in the __init__ method is defined with a default value of None, but layer.moe_config is accessed unconditionally on the next line. This creates a potential AttributeError if AscendFusedMoEMethod is ever instantiated without providing a layer argument. Given that layer is always expected to be a FusedMoE instance when this method is called (as seen in AscendQuantConfig.get_quant_method), it should be made a required argument to reflect its actual usage and prevent potential runtime errors.
| layer: torch.nn.Module = None): | |
| layer: torch.nn.Module): |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: Wang Kunpeng <[email protected]>
Signed-off-by: Wang Kunpeng <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
…to qwen3-moe-bugfix # Conflicts: # tests/e2e/multicard/test_data_parallel.py
### What this PR does / why we need it? Fix the issue where the qwen3 moe service cannot be started due to upgrading the vllm version Error info: AttributeError: 'AscendFusedMoE' object has no attribute 'use dp chunking' ### Does this PR introduce _any_ user-facing change? no - vLLM version: v0.11.2 --------- Signed-off-by: Wang Kunpeng <[email protected]>
| layer: torch.nn.Module = None): | ||
| def __init__(self, quant_config: AscendQuantConfig, prefix: str, | ||
| packed_modules_mapping: Dict[str, | ||
| Any], layer: torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest formatting it.
### What this PR does / why we need it? Fix the issue where the qwen3 moe service cannot be started due to upgrading the vllm version Error info: AttributeError: 'AscendFusedMoE' object has no attribute 'use dp chunking' ### Does this PR introduce _any_ user-facing change? no - vLLM version: v0.11.2 --------- Signed-off-by: Wang Kunpeng <[email protected]> Signed-off-by: Che Ruan <[email protected]>
### What this PR does / why we need it? Fix the issue where the qwen3 moe service cannot be started due to upgrading the vllm version Error info: AttributeError: 'AscendFusedMoE' object has no attribute 'use dp chunking' ### Does this PR introduce _any_ user-facing change? no - vLLM version: v0.11.2 --------- Signed-off-by: Wang Kunpeng <[email protected]> Signed-off-by: Che Ruan <[email protected]>
What this PR does / why we need it?
Fix the issue where the qwen3 moe service cannot be started due to upgrading the vllm version
Error info:
AttributeError: 'AscendFusedMoE' object has no attribute 'use dp chunking'
Does this PR introduce any user-facing change?
no
How was this patch tested?