-
Notifications
You must be signed in to change notification settings - Fork 646
Open
Labels
Description
Full list: https://docs.google.com/spreadsheets/d/13cvjdb7QJ7HPSntAUUhhB3jjMaUYU292cTH_HSM5u_s/
accuracy test: #3401
New comming models:
- ZhipuAI/GLM-4.5: [Usage]: ZhipuAI/GLM-4.5 run multi node with aclgraph #2082
- moonshotai/Kimi-K2-Instruct: [Doc] Support kimi-k2-w8a8 #2162
- Tencent-Hunyuan/Hunyuan-A13B-Instruct: vLLM Ascend Model Support Priority #1608 (comment)
- Ernie4.5: vLLM Ascend Model Support Priority #1608 (comment)
Pipeline to support a new model
- set model level --> update this table
- accuracy test
- write doc according to level --> update model support list
- fix issues
- add feature test and performance improve
Optimized:
- tutorials doc
- ci: accuracy test
- ci: perf test
- ci: feature test
- issues fix: high priority
| Type | Architecture | Models | Model Name | Eager mode (BF16) | Graph mode (BF16) | W8A8 | W4A8 | W8A8C8 | Accuracy Test |
|---|---|---|---|---|---|---|---|---|---|
| Text-only | DeepseekV3ForCausalLM | DeepSeek-V3 | deepseek-ai/DeepSeek-V3 | ✅ | ✅ | ✅ | 待多机CI | ||
| Text-only | DeepseekV3ForCausalLM | DeepSeek-R1 | deepseek-ai/DeepSeek-R1 | ✅ | ✅ | 待多机CI | |||
| Text-only | Qwen2ForCausalLM | QwQ, Qwen2 | Qwen/Qwen2-7B | ✅ | ✅ | ✅ | ✅ | ||
| Text-only | Qwen3ForCausalLM | Qwen3 | Qwen/Qwen3-8B | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Text-only | Qwen3MoeForCausalLM | Qwen3MoE | Qwen/Qwen3-30B-A3B | ✅ | ✅ | ✅#2469 | ✅ | ||
| Multimodal | Qwen2AudioForConditionalGeneration | Qwen2-Audio | Qwen/Qwen2-Audio-7B-Instruct | ✅ | ✅ | ✅ | |||
| Multimodal | Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL | Qwen/Qwen2-VL-7B-Instruct | ✅ | ✅ | ✅ | |||
| Multimodal | Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | Qwen/Qwen2.5-VL-3B-Instruct | ✅ | ✅ | ✅ | |||
| Multimodal | Qwen3VLForConditionalGeneration | Qwen3-VL | Qwen/Qwen3-VL-8B-Instruct | ✅ | ✅ | ✅ | |||
| Multimodal | Qwen3VLMoeForConditionalGeneration | Qwen3-VL-MOE | Qwen3-VL-30B-A3B-Instruct | ✅ | ✅ | ✅ |
Functional:
- tutorials doc --> day x
- ci: accuracy test
- issue fix: middle priority
| Type | Architecture | Models | Model Name | Eager mode | ACLGraph mode | Accuracy Test |
|---|---|---|---|---|---|---|
| Text-only | BaiChuanForCausalLM | Baichuan2, Baichuan | baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ | 9/30 |
| Text-only | DeepseekV2ForCausalLM | DeepSeek-V2 | deepseek-ai/DeepSeek-V2-Lite-Chat | ✅ | ❌ | ✅ |
| Text-only | Ernie4_5_ForCausalLM | Ernie4.5 | PaddlePaddle/ERNIE-4.5-0.3B-PT | ✅ | ✅ | 9/30 |
| Text-only | Ernie4_5_MoeForCausalLM | Ernie4.5MoE | PaddlePaddle/ERNIE-4.5-21B-A3B-PT | ✅ | ✅ | 9/30 |
| Text-only | Gemma2ForCausalLM | Gemma2 | google/gemma-2-9b | ✅ | ✅ | 9/30 |
| Text-only | Gemma3ForCausalLM | Gemma3 | LLM-Research/gemma-3-1b-it | ✅ | ✅ | 9/30 |
| Text-only | InternLMForCausalLM | InternLM | Shanghai_AI_Laboratory/internlm-7b | ✅ | ✅ #1962 | 9/30 |
| Text-only | LlamaForCausalLM | Llama3.1, Llama3, Llama2, LLaMA, Yi | LLM-Research/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ | 9/30 |
| Text-only | MiniCPMForCausalLM | MiniCPM | OpenBMB/miniCPM-bf16 | ✅ | ✅ | 9/30 |
| Text-only | MiniCPM3ForCausalLM | MiniCPM3 | OpenBMB/MiniCPM3-4B | ✅ | ✅ | 9/30 |
| Text-only | Phi3ForCausalLM | Phi-4, Phi-3 | LLM-Research/Phi-4-mini-instruct | ✅ | ✅ | 9/30 |
| Text-only | XLMRobertaForSequenceClassification | XLM-RoBERTa-based | BAAI/bge-reranker-v2-m3 | #1960 | #1960 | 🟡 |
| Multimodal | KeyeForConditionalGeneration | Keye-VL-8B-Preview | Kwai-Keye/Keye-VL-8B-Preview | #1961 | #1961 | 🟡 |
| Multimodal | Llama4ForConditionalGeneration | Llama4 | meta-llama/Llama-4-Scout-17B-16E-Instruct | #1972 | #1972 | 🟡 |
| Multimodal | LlavaForConditionalGeneration | LLaVA-1.5/1.6 | llava-hf/llava-1.5-7b-hf | ✅ | ✅ #1962 | 9/30 |
| Multimodal | MllamaForConditionalGeneration | Llama3.2 | LLM-Research/Llama-3.2-11B-Vision | #1963 | #1963 | 🟡 |
| Multimodal | MolmoForCausalLM | Molmo | LLM-Research/Molmo-7B-D-0924 | ✅ | ✅ #1942 | 9/30 |
| Multimodal | Qwen2_5OmniThinker | Qwen2.5-Omni | Qwen/Qwen2.5-Omni-7B | ✅ | ✅ #1760 | 9/30 |
Others
- issue mentioned
- ci: accuracy
- issue fix: low priority
| Type | Architecture | Models | Modle Name | Eager mode | ACLGraph mode | Accuracy Test |
|---|---|---|---|---|---|---|
| Text-only | GlmForCausalLM | GLM-4 | ZhipuAI/glm-4-9b-chat-hf | #2255 | #2255 | 🟡 |
| Text-only | Glm4ForCausalLM | GLM-4-0414 | ZhipuAI/GLM-4-32B-0414 | #2258 | #2258 | 🟡 |
| Text-only | MistralForCausalLM | Mistral, Mistral-Instruct | mistralai/Mistral-7B-Instruct-v0.1 | ✅ | ✅ | 10/15 |
| Text-only | MiniMaxM1ForCausalLM | MiniMax-Text | MiniMax/MiniMax-M1-40k | #2414 | #2414 | 🟡 |
| Text-only | MiniMaxText01ForCausalLM | MiniMax-Text | MiniMax/MiniMax-Text-01 | #2414 | #2414 | 🟡 |
| Text-only | Qwen2ForProcessRewardModel | Qwen2-based, Qwen2.5 | Qwen/Qwen2.5-Math-PRM-7B | ✅ | ✅ | 10/15 |
| Text-only | Qwen3ForSequenceClassification | Qwen3-based | Qwen/Qwen3-Reranker-0.6B | ✅ | ✅ | 10/15 |
| Multimodal | AriaForConditionalGeneration | Aria | rhymes-ai/Aria | ✅ | ✅ | 10/15 |
| Multimodal | Florence2ForConditionalGeneration | Florence-2 | microsoft/Florence-2-base | #2259 | #2259 | 🟡 |
| Multimodal | Gemma3ForConditionalGeneration | Gemma3 | LLM-Research/gemma-3-4b-it | ✅ | ✅ | 10/15 |
| Multimodal | GLM4VForCausalLM | GLM-4V | ZhipuAI/glm-4v-9b | #2260 | #2260 | 🟡 |
| Multimodal | InternVLChatModel | InternVL3.0, InternVideo2.5, InternVL2.5, Mono-InternVL, InternVL2.0 | OpenGVLab/InternVL3-9B | ✅ | #2064 | 🟡 |
| Multimodal | LlavaNextForConditionalGeneration | LLaVA-NeXT | llava-hf/llava-v1.6-mistral-7b-hf | ✅ | ✅ | 10/15 |
| Multimodal | LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | llava-hf/LLaVA-NeXT-Video-7B-hf | ✅ | ✅ | 10/15 |
| Multimodal | MiniCPMV | MiniCPM-V | OpenBMB/MiniCPM-Llama3-V-2_5 | ✅ | ✅ | 10/15 |
| Multimodal | Mistral3ForConditionalGeneration | Mistral3 | mistralai/Mistral-Small-3.1-24B-Instruct-2503 | ✅ | ✅ | 10/15 |
| Multimodal | Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | microsoft/Phi-3.5-vision-instruct | ✅ | ✅ | 🟡 |
| Multimodal | WhisperForConditionalGeneration | Whisper | openai-mirror/whisper-small | #2262 | #2262 | 🟡 |
| Multimodal | Glm4MoeForCausalLM | GLM-4.5 | ZhipuAI/GLM-4.5-Air | ✅ | ✅ | 10/15 |
| Multimodal | Glm4vMoeForConditionalGeneration | GLM-4.5V | ZhipuAI/GLM-4.5V | #2516 | #2516 | 🟡 |