-
Notifications
You must be signed in to change notification settings - Fork 671
fix nz for quantization #4943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix nz for quantization #4943
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -347,7 +347,7 @@ def process_weights_after_loading(self, layer): | |
| # converting ACL_FORMAT_FRACTAL_NZ. | ||
| # npu_quant_grouped_matmul_dequant in eager mode does not accept | ||
| # ACL_FORMAT_FRACTAL_NZ. | ||
| if not is_310p() and is_enable_nz(): | ||
| if not is_310p(): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing the |
||
| layer.w13_weight.data = torch_npu.npu_format_cast( | ||
| layer.w13_weight.data, ACL_FORMAT_FRACTAL_NZ).contiguous() | ||
| layer.w2_weight.data = torch_npu.npu_format_cast( | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -270,9 +270,8 @@ def process_weights_after_loading(self, layer): | |
| 1, 2).contiguous() | ||
| layer.w2_weight.data = layer.w2_weight.data.transpose( | ||
| 1, 2).contiguous() | ||
| if is_enable_nz(): | ||
| torch_npu.npu_format_cast_(layer.w13_weight, ACL_FORMAT_FRACTAL_NZ) | ||
| torch_npu.npu_format_cast_(layer.w2_weight, ACL_FORMAT_FRACTAL_NZ) | ||
| torch_npu.npu_format_cast_(layer.w13_weight, ACL_FORMAT_FRACTAL_NZ) | ||
| torch_npu.npu_format_cast_(layer.w2_weight, ACL_FORMAT_FRACTAL_NZ) | ||
|
Comment on lines
+273
to
+274
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change makes the |
||
| layer.w13_weight_scale.data = layer.w13_weight_scale.data.view( | ||
| layer.w13_weight_scale.data.shape[0], -1) | ||
| layer.w13_weight_scale_fp32 = layer.w13_weight_scale.data.to( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes the
ACL_FORMAT_FRACTAL_NZconversion unconditional. However, the removedis_enable_nz()check contained specific logic to disable this conversion for theqwen3_nextmodel, which could be for compatibility reasons. Applying this format conversion toqwen3_nextmay cause a regression if the model does not support it for W4A8 dynamic quantization. If this model-specific limitation still exists, the check should be preserved, perhaps without the dependency on theVLLM_ASCEND_ENABLE_NZenvironment variable.