Skip to content

[Feature]: Add AWQ quantization support for vllm-ascend #4378

@menogrey

Description

@menogrey

🚀 The feature, motivation and pitch

motivation

AWQ quantization is is a commonly used quantitative method, and there are many quantized models that can be used immediately, such as Qwen. Now vllm-ascend support special quantized model which is quantized by modelslim, but it will take a lot of time to quantize model and we cannot cover all the models if user want to run a quantized model.

implement

Refer to PR: #4316
vllm-ascend: #4316
vllm: v0.11.1rc5 (2918c1b)

validation

Note: You can go to modescope to get the weight with model name as below

  • Every model is tested in single-node, and for large model tensor parallel is only used.
  • ⚠️WARNING: Use export VLLM_ASCEND_ENABLE_NZ=0 to avoid error when process weights.
  • ⚠️WARNING: Not support expert parallel now.
Type Architecture Models Model Name Aclgraph Mode Accuracy Performance Compare to W8A8
Text-only DeepseekV3ForCausalLM DeepSeek-V3 tclf90/DeepSeek-V3.1-AWQ ceval:0.896
Text-only DeepseekV3ForCausalLM DeepSeek-R1 cognitivecomputations/DeepSeek-R1-awq ceval:0.8923
Text-only Qwen2ForCausalLM QwQ, Qwen2 Qwen/Qwen2.5-32B-Instruct-AWQ Qwen/QwQ-32B-AWQ
Text-only Qwen3ForCausalLM Qwen3 Qwen/Qwen3-32B-AWQ ceval:0.85
Text-only Qwen3MoeForCausalLM Qwen3MoE billy800/Qwen3-30B-A3B-Instruct-2507-AWQ swift/Qwen3-235B-A22B-Instruct-2507-AWQ ✅ ❌(accuracy issue) ceval:0.8403
Multimodal Qwen2AudioForConditionalGeneration Qwen2-Audio No AWQ quantized model provided
Multimodal Qwen2VLForConditionalGeneration QVQ, Qwen2-VL Qwen/Qwen2-VL-7B-Instruct-AWQ
Multimodal Qwen2_5_VLForConditionalGeneration Qwen2.5-VL Qwen/Qwen2.5-VL-32B-Instruct-AWQ ❌(accuracy issue)
Multimodal Qwen3VLForConditionalGeneration Qwen3-VL tclf90/Qwen3-VL-32B-Instruct-AWQ
Multimodal Qwen3VLMoeForConditionalGeneration Qwen3-VL-MOE tclf90/Qwen3-VL-30B-A3B-Instruct-AWQ tclf90/Qwen3-VL-235B-A22B-Instruct-AWQ ✅ ❌(accuracy issue)

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions