Skip to content

vLLM Ascend Model Support Priority #1608

@shen-shanshan

Description

@shen-shanshan

Full list: https://docs.google.com/spreadsheets/d/13cvjdb7QJ7HPSntAUUhhB3jjMaUYU292cTH_HSM5u_s/
accuracy test: #3401

New comming models:


Pipeline to support a new model

  1. set model level --> update this table
  2. accuracy test
  3. write doc according to level --> update model support list
  4. fix issues
  5. add feature test and performance improve

Optimized:

  • tutorials doc
  • ci: accuracy test
  • ci: perf test
  • ci: feature test
  • issues fix: high priority
Type Architecture Models Model Name Eager mode (BF16) Graph mode (BF16) W8A8 W4A8 W8A8C8 Accuracy Test
Text-only DeepseekV3ForCausalLM DeepSeek-V3 deepseek-ai/DeepSeek-V3 待多机CI
Text-only DeepseekV3ForCausalLM DeepSeek-R1 deepseek-ai/DeepSeek-R1 待多机CI
Text-only Qwen2ForCausalLM QwQ, Qwen2 Qwen/Qwen2-7B
Text-only Qwen3ForCausalLM Qwen3 Qwen/Qwen3-8B
Text-only Qwen3MoeForCausalLM Qwen3MoE Qwen/Qwen3-30B-A3B #2469
Multimodal Qwen2AudioForConditionalGeneration Qwen2-Audio Qwen/Qwen2-Audio-7B-Instruct
Multimodal Qwen2VLForConditionalGeneration QVQ, Qwen2-VL Qwen/Qwen2-VL-7B-Instruct
Multimodal Qwen2_5_VLForConditionalGeneration Qwen2.5-VL Qwen/Qwen2.5-VL-3B-Instruct
Multimodal Qwen3VLForConditionalGeneration Qwen3-VL Qwen/Qwen3-VL-8B-Instruct
Multimodal Qwen3VLMoeForConditionalGeneration Qwen3-VL-MOE Qwen3-VL-30B-A3B-Instruct

Functional:

  • tutorials doc --> day x
  • ci: accuracy test
  • issue fix: middle priority
Type Architecture Models Model Name Eager mode ACLGraph mode Accuracy Test
Text-only BaiChuanForCausalLM Baichuan2, Baichuan baichuan-inc/Baichuan2-13B-Chat 9/30
Text-only DeepseekV2ForCausalLM DeepSeek-V2 deepseek-ai/DeepSeek-V2-Lite-Chat
Text-only Ernie4_5_ForCausalLM Ernie4.5 PaddlePaddle/ERNIE-4.5-0.3B-PT 9/30
Text-only Ernie4_5_MoeForCausalLM Ernie4.5MoE PaddlePaddle/ERNIE-4.5-21B-A3B-PT 9/30
Text-only Gemma2ForCausalLM Gemma2 google/gemma-2-9b 9/30
Text-only Gemma3ForCausalLM Gemma3 LLM-Research/gemma-3-1b-it 9/30
Text-only InternLMForCausalLM InternLM Shanghai_AI_Laboratory/internlm-7b #1962 9/30
Text-only LlamaForCausalLM Llama3.1, Llama3, Llama2, LLaMA, Yi LLM-Research/Meta-Llama-3.1-8B-Instruct 9/30
Text-only MiniCPMForCausalLM MiniCPM OpenBMB/miniCPM-bf16 9/30
Text-only MiniCPM3ForCausalLM MiniCPM3 OpenBMB/MiniCPM3-4B 9/30
Text-only Phi3ForCausalLM Phi-4, Phi-3 LLM-Research/Phi-4-mini-instruct 9/30
Text-only XLMRobertaForSequenceClassification XLM-RoBERTa-based BAAI/bge-reranker-v2-m3 #1960 #1960 🟡
Multimodal KeyeForConditionalGeneration Keye-VL-8B-Preview Kwai-Keye/Keye-VL-8B-Preview #1961 #1961 🟡
Multimodal Llama4ForConditionalGeneration Llama4 meta-llama/Llama-4-Scout-17B-16E-Instruct #1972 #1972 🟡
Multimodal LlavaForConditionalGeneration LLaVA-1.5/1.6 llava-hf/llava-1.5-7b-hf #1962 9/30
Multimodal MllamaForConditionalGeneration Llama3.2 LLM-Research/Llama-3.2-11B-Vision #1963 #1963 🟡
Multimodal MolmoForCausalLM Molmo LLM-Research/Molmo-7B-D-0924 #1942 9/30
Multimodal Qwen2_5OmniThinker Qwen2.5-Omni Qwen/Qwen2.5-Omni-7B #1760 9/30

Others

  • issue mentioned
  • ci: accuracy
  • issue fix: low priority
Type Architecture Models Modle Name Eager mode ACLGraph mode Accuracy Test
Text-only GlmForCausalLM GLM-4 ZhipuAI/glm-4-9b-chat-hf #2255 #2255 🟡
Text-only Glm4ForCausalLM GLM-4-0414 ZhipuAI/GLM-4-32B-0414 #2258 #2258 🟡
Text-only MistralForCausalLM Mistral, Mistral-Instruct mistralai/Mistral-7B-Instruct-v0.1 10/15
Text-only MiniMaxM1ForCausalLM MiniMax-Text MiniMax/MiniMax-M1-40k #2414 #2414 🟡
Text-only MiniMaxText01ForCausalLM MiniMax-Text MiniMax/MiniMax-Text-01 #2414 #2414 🟡
Text-only Qwen2ForProcessRewardModel Qwen2-based, Qwen2.5 Qwen/Qwen2.5-Math-PRM-7B 10/15
Text-only Qwen3ForSequenceClassification Qwen3-based Qwen/Qwen3-Reranker-0.6B 10/15
Multimodal AriaForConditionalGeneration Aria rhymes-ai/Aria 10/15
Multimodal Florence2ForConditionalGeneration Florence-2 microsoft/Florence-2-base #2259 #2259 🟡
Multimodal Gemma3ForConditionalGeneration Gemma3 LLM-Research/gemma-3-4b-it 10/15
Multimodal GLM4VForCausalLM GLM-4V ZhipuAI/glm-4v-9b #2260 #2260 🟡
Multimodal InternVLChatModel InternVL3.0, InternVideo2.5, InternVL2.5, Mono-InternVL, InternVL2.0 OpenGVLab/InternVL3-9B #2064 🟡
Multimodal LlavaNextForConditionalGeneration LLaVA-NeXT llava-hf/llava-v1.6-mistral-7b-hf 10/15
Multimodal LlavaNextVideoForConditionalGeneration LLaVA-NeXT-Video llava-hf/LLaVA-NeXT-Video-7B-hf 10/15
Multimodal MiniCPMV MiniCPM-V OpenBMB/MiniCPM-Llama3-V-2_5 10/15
Multimodal Mistral3ForConditionalGeneration Mistral3 mistralai/Mistral-Small-3.1-24B-Instruct-2503 10/15
Multimodal Phi3VForCausalLM Phi-3-Vision, Phi-3.5-Vision microsoft/Phi-3.5-vision-instruct 🟡
Multimodal WhisperForConditionalGeneration Whisper openai-mirror/whisper-small #2262 #2262 🟡
Multimodal Glm4MoeForCausalLM GLM-4.5 ZhipuAI/GLM-4.5-Air 10/15
Multimodal Glm4vMoeForConditionalGeneration GLM-4.5V ZhipuAI/GLM-4.5V #2516 #2516 🟡

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions