Skip to content

Commit 724d043

Browse files
JeffLee1874lijifu
andauthored
[model] Support PanguUltraMoE (#4615)
### What this PR does / why we need it? To support PanguUltraMoE model ### Test result #### Start serving using W8A8 quantized model and ACL graph: Master node: ``` vllm serve $LOCAL_CKPT_DIR \ --host 0.0.0.0 \ --port 8000 \ --data-parallel-size 2 \ --data-parallel-size-local 1 \ --data-parallel-address $MASTER_NODE_IP \ --data-parallel-rpc-port 13389 \ --tensor-parallel-size 16 \ --seed 1024 \ --enable-expert-parallel \ --served-model-name $NAME \ --max-model-len 4096 \ --max-num-batched-tokens 256 \ --max-num-seqs 18 \ --trust-remote-code \ --gpu-memory-utilization 0.90 \ --quantization ascend \ --additional-config '{"ascend_scheduler_config":{"enabled":false, "enable_chunked_prefill":true, "chunked_prefill_enabled":true},"torchair_graph_config":{"enabled":false}}' \ --speculative_config '{"method": "pangu_ultra_moe_mtp", "num_speculative_tokens": 1}' \ ``` Other nodes: ``` vllm serve $LOCAL_CKPT_DIR \ --host 0.0.0.0 \ --port 8000 \ --headless \ --data-parallel-size 2 \ --data-parallel-size-local 1 \ --data-parallel-start-rank 1 \ --data-parallel-address $MASTER_NODE_IP \ --data-parallel-rpc-port 13389 \ --tensor-parallel-size 16 \ --seed 1024 \ --enable-expert-parallel \ --served-model-name $NAME \ --max-model-len 4096 \ --max-num-batched-tokens 256 \ --max-num-seqs 18 \ --trust-remote-code \ --gpu-memory-utilization 0.90 \ --quantization ascend \ --additional-config '{"ascend_scheduler_config":{"enabled":false, "enable_chunked_prefill":true, "chunked_prefill_enabled":true},"torchair_graph_config":{"enabled":false}}' \ --speculative_config '{"method": "pangu_ultra_moe_mtp", "num_speculative_tokens": 1}' \ ``` Request & Response: - Request ``` curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "system", "content": ""}, {"role": "user", "content": "你是谁?"} ], "max_tokens": "64", "top_p": "0.95", "top_k": "50", "temperature": "0.6", "add_special_tokens" : true }' ``` - Response ``` [unused16] 好的,用户问我是谁,我需要按照之前的设定来回答。首先,我的角色是盘古,由华为开发,属于推理模型。要强调我的主要功能是解答问题和提供信息支持,特别是通过逻辑推理和数据分析处理复杂任务。需要保持回答简洁,用中文,并且符合用户的 ``` - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0 Signed-off-by: lijifu <[email protected]> Co-authored-by: lijifu <[email protected]>
1 parent f0060fc commit 724d043

File tree

2 files changed

+14
-0
lines changed

2 files changed

+14
-0
lines changed

vllm_ascend/quantization/quant_config.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,12 @@ def get_scaled_act_names(self) -> List[str]:
221221
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
222222
"fused_qkv_a_proj": ["q_a_proj", "kv_a_proj_with_mqa"]
223223
},
224+
"pangu_ultra_moe": {
225+
"gate_up_proj": ["gate_proj", "up_proj"],
226+
"experts":
227+
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
228+
"fused_qkv_a_proj": ["q_a_proj", "kv_a_proj_with_mqa"]
229+
},
224230
"kimi_k2": {
225231
"gate_up_proj": ["gate_proj", "up_proj"],
226232
"experts":
@@ -241,6 +247,12 @@ def get_scaled_act_names(self) -> List[str]:
241247
"experts":
242248
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"]
243249
},
250+
"pangu_ultra_moe_mtp": {
251+
"gate_up_proj": ["gate_proj", "up_proj"],
252+
"experts":
253+
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
254+
"fused_qkv_a_proj": ["q_a_proj", "kv_a_proj_with_mqa"]
255+
},
244256
"qwen3_next": {
245257
"qkv_proj": [
246258
"q_proj",

vllm_ascend/spec_decode/mtp_proposer.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@
4444
_MTP_MODELS = {
4545
"DeepseekV3ForCausalLM":
4646
("vllm.model_executor.models.deepseek_mtp", "DeepSeekMTP"),
47+
"PanguUltraMoEForCausalLM":
48+
("vllm.model_executor.models.openpangu_mtp", "OpenPanguMTP"),
4749
"DeepseekV32ForCausalLM":
4850
("vllm.model_executor.models.deepseek_mtp", "DeepSeekMTP"),
4951
"Qwen3NextForCausalLM":

0 commit comments

Comments
 (0)