Commit e564470
[Attention][Kernel]moe support for llama4 and mllama4 (#740)
### What this PR does / why we need it?
moe support for llama4 and mllama4 in vllm-ascend
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
start sever:
python -m vllm.entrypoints.openai.api_server --model
/data/nfs/benchmark/tokenizer/Llama-4-Scout-17B-16E-Instruct \
--max-num-seqs=256 \
--max-model-len=8192 \
--tensor-parallel-size=8 \
--block-size=128 \
--dtype bfloat16 \
--host=0.0.0.0 \
--port=8000 \
--gpu-memory-utilization=0.9 \
--trust-remote-code
client:
python online_server.py --model-path
/data/nfs/benchmark/tokenizer/Llama-4-Scout-17B-16E-Instruct
--image-path /data/nfs/w60040464/cherry_blossom.jpg --docker-ip
7.242.108.253 --served-port 8000 --text "what is the content of this
image?"
result:
{'id': 'chatcmpl-2b709a5d2e1a4017991ec4ba8248686a', 'object':
'chat.completion', 'created': 1747056823, 'model':
'/data/nfs/benchmark/tokenizer/Llama-4-Scout-17B-16E-Instruct',
'choices': [{'index': 0, 'message': {'role': 'assistant',
'reasoning_content': None, 'content': 'The image depicts a tower, likely
Tokyo Skytree, framed by branches of a cherry blossom tree. The tower is
white and has a distinctive shape, with a large sphere at the top and a
long, thin spire extending from it. The branches of the cherry blossom
tree are in the foreground, with pink flowers blooming on them. The
background is a clear blue sky.\n\n**Key Features:**\n\n* **Tower:**
White, spherical shape at the top, long thin spire\n', 'tool_calls':
[]}, 'logprobs': None, 'finish_reason': 'length', 'stop_reason': None}],
'usage': {'prompt_tokens': 2340, 'total_tokens': 2440,
'completion_tokens': 100, 'prompt_tokens_details': None},
'prompt_logprobs': None}
Signed-off-by: chenxu <[email protected]>
Co-authored-by: chenxu <[email protected]>
Co-authored-by: evian <[email protected]>1 parent 217211d commit e564470
File tree
4 files changed
+34
-12
lines changed- vllm_ascend
- attention
- ops
4 files changed
+34
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
708 | 708 | | |
709 | 709 | | |
710 | 710 | | |
| 711 | + | |
711 | 712 | | |
712 | 713 | | |
713 | 714 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
| 177 | + | |
177 | 178 | | |
178 | 179 | | |
179 | 180 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
156 | 157 | | |
157 | 158 | | |
158 | 159 | | |
| |||
191 | 192 | | |
192 | 193 | | |
193 | 194 | | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
194 | 204 | | |
195 | 205 | | |
196 | 206 | | |
| |||
292 | 302 | | |
293 | 303 | | |
294 | 304 | | |
| 305 | + | |
| 306 | + | |
295 | 307 | | |
296 | 308 | | |
297 | 309 | | |
298 | 310 | | |
299 | 311 | | |
300 | 312 | | |
301 | 313 | | |
302 | | - | |
| 314 | + | |
303 | 315 | | |
304 | 316 | | |
305 | 317 | | |
| |||
366 | 378 | | |
367 | 379 | | |
368 | 380 | | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | 381 | | |
373 | 382 | | |
374 | 383 | | |
| |||
405 | 414 | | |
406 | 415 | | |
407 | 416 | | |
408 | | - | |
| 417 | + | |
409 | 418 | | |
410 | 419 | | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
411 | 429 | | |
412 | 430 | | |
413 | 431 | | |
| |||
0 commit comments