fix nz for quantization #4943

wangxiyuan · 2025-12-12T03:40:58Z

quantization ops rely on NZ by force, we should remove the nz check for it.

gemini-code-assist

Code Review

This pull request simplifies the logic for applying the ACL_FORMAT_FRACTAL_NZ format during quantization by making it unconditional in several places. While this removes a dependency on the is_enable_nz utility function, it also removes a model-specific exception for qwen3_next. This could introduce a critical regression if that model still has incompatibilities with the NZ format. The review comments highlight these locations and suggest that the model-specific check should be preserved if it is still required.

gemini-code-assist · 2025-12-12T03:43:07Z

vllm_ascend/quantization/w4a8_dynamic.py

+        layer.w13_weight.data = torch_npu.npu_format_cast(
+            layer.w13_weight.data, ACL_FORMAT_FRACTAL_NZ)
+        layer.w2_weight.data = torch_npu.npu_format_cast(
+            layer.w2_weight.data, ACL_FORMAT_FRACTAL_NZ)


This change makes the ACL_FORMAT_FRACTAL_NZ conversion unconditional. However, the removed is_enable_nz() check contained specific logic to disable this conversion for the qwen3_next model, which could be for compatibility reasons. Applying this format conversion to qwen3_next may cause a regression if the model does not support it for W4A8 dynamic quantization. If this model-specific limitation still exists, the check should be preserved, perhaps without the dependency on the VLLM_ASCEND_ENABLE_NZ environment variable.

gemini-code-assist · 2025-12-12T03:43:07Z

vllm_ascend/quantization/w8a8.py

        # npu_quant_grouped_matmul_dequant in eager mode does not accept
        # ACL_FORMAT_FRACTAL_NZ.
-        if not is_310p() and is_enable_nz():
+        if not is_310p():


Removing the is_enable_nz() check from this condition makes the NZ format conversion unconditional for non-310p devices. The is_enable_nz() function contained an exception to disable NZ format for the qwen3_next model. This change will now apply the conversion to qwen3_next, which could cause a regression if this model does not support NZ format for W8A8 quantization. If the qwen3_next model still has this incompatibility, the check should be restored.

gemini-code-assist · 2025-12-12T03:43:07Z

vllm_ascend/quantization/w8a8_dynamic.py

+        torch_npu.npu_format_cast_(layer.w13_weight, ACL_FORMAT_FRACTAL_NZ)
+        torch_npu.npu_format_cast_(layer.w2_weight, ACL_FORMAT_FRACTAL_NZ)


This change makes the ACL_FORMAT_FRACTAL_NZ conversion unconditional. The removed is_enable_nz() check included logic to disable this conversion for the qwen3_next model. This could lead to a regression for qwen3_next if it does not support the NZ format for W8A8 dynamic quantization. If qwen3_next still has this limitation, the model-specific check should be preserved.

Signed-off-by: wangxiyuan <[email protected]>

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 12, 2025

fix nz for quantization

cb596f5

Signed-off-by: wangxiyuan <[email protected]>

wangxiyuan force-pushed the fix_nz_dev branch from 3d44523 to cb596f5 Compare December 12, 2025 06:54

wangxiyuan merged commit 01a13a9 into vllm-project:v0.11.0-dev Dec 12, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix nz for quantization #4943

fix nz for quantization #4943

wangxiyuan commented Dec 12, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		torch_npu.npu_format_cast_(layer.w13_weight, ACL_FORMAT_FRACTAL_NZ)
		torch_npu.npu_format_cast_(layer.w2_weight, ACL_FORMAT_FRACTAL_NZ)

fix nz for quantization #4943

fix nz for quantization #4943

Conversation

wangxiyuan commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wangxiyuan commented Dec 12, 2025 •

edited

Loading