Commit 00aa0bf
authored
support prefill cache mode use fia op (#3696)
### What this PR does / why we need it?
support prefill cache mode use fia op for full graph
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@17c540a
origin
============ Serving Benchmark Result ============
Successful requests: 30
Maximum request concurrency: 256
Request rate configured (RPS): 0.70
Benchmark duration (s): 131.63
Total input tokens: 61363
Total generated tokens: 61440
Request throughput (req/s): 0.23
Output token throughput (tok/s): 466.77
Peak output token throughput (tok/s): 750.00
Peak concurrent requests: 30.00
Total Token throughput (tok/s): 932.95
---------------Time to First Token----------------
Mean TTFT (ms): 125.17
Median TTFT (ms): 121.51
P50 TTFT (ms): 121.51
P90 TTFT (ms): 140.91
P99 TTFT (ms): 182.36
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 43.85
Median TPOT (ms): 43.84
P50 TPOT (ms): 43.84
P90 TPOT (ms): 44.28
P99 TPOT (ms): 44.32
---------------Inter-token Latency----------------
Mean ITL (ms): 43.85
Median ITL (ms): 42.63
P50 ITL (ms): 42.63
P90 ITL (ms): 48.74
P99 ITL (ms): 59.62
==================================================
after
============ Serving Benchmark Result ============
Successful requests: 30
Maximum request concurrency: 256
Request rate configured (RPS): 0.70
Benchmark duration (s): 130.10
Total input tokens: 61363
Total generated tokens: 61440
Request throughput (req/s): 0.23
Output token throughput (tok/s): 472.26
Peak output token throughput (tok/s): 750.00
Peak concurrent requests: 30.00
Total Token throughput (tok/s): 943.94
---------------Time to First Token----------------
Mean TTFT (ms): 123.69
Median TTFT (ms): 122.51
P50 TTFT (ms): 122.51
P90 TTFT (ms): 143.69
P99 TTFT (ms): 165.00
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 43.07
Median TPOT (ms): 43.13
P50 TPOT (ms): 43.13
P90 TPOT (ms): 43.50
P99 TPOT (ms): 43.57
---------------Inter-token Latency----------------
Mean ITL (ms): 43.07
Median ITL (ms): 41.81
P50 ITL (ms): 41.81
P90 ITL (ms): 48.11
P99 ITL (ms): 62.13
==================================================
Signed-off-by: shiyuan680 <[email protected]>1 parent 3e5ae49 commit 00aa0bf
File tree
3 files changed
+45
-14
lines changed- vllm_ascend
- attention
- worker
3 files changed
+45
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
| 72 | + | |
71 | 73 | | |
72 | 74 | | |
73 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
491 | 491 | | |
492 | 492 | | |
493 | 493 | | |
| 494 | + | |
494 | 495 | | |
495 | | - | |
496 | | - | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
504 | | - | |
505 | | - | |
506 | | - | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
507 | 532 | | |
508 | 533 | | |
509 | 534 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
962 | 962 | | |
963 | 963 | | |
964 | 964 | | |
965 | | - | |
966 | | - | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
967 | 971 | | |
968 | 972 | | |
969 | 973 | | |
| |||
0 commit comments