Commit 490ddf5
authored
[perf][dsv3.2][async_scheduling] improve dsv3.2 performance by eliminating HD synchronization (#4805)
### What this PR does / why we need it?
This PR eliminates the simplicit HD synchronization in sfa backend, and
_build_dummy_attn_metadata and dummy_run in mtp_proposer, significantly
improving dsv3.2 performance in low-latency scenarios.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Performance improvements are observed with E2E performance serving (P:
DP4TP8EP32 D: DP8TP4EP32) with `num_speculative_tokens=3`.
DSV3.2-W8A8-EXP:
TPOT: 41.67ms -> 23.36ms
ITL: 85.93ms -> 55.96ms
DSV3.2-W8A8 (relaesed in December):
TPOT: 18.11ms
ITL: 56.13ms
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e
Signed-off-by: linfeng-yuan <[email protected]>1 parent dd622aa commit 490ddf5
File tree
3 files changed
+16
-7
lines changed- vllm_ascend
- attention
- spec_decode
- worker
3 files changed
+16
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
170 | 170 | | |
171 | 171 | | |
172 | 172 | | |
173 | | - | |
174 | | - | |
175 | | - | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
176 | 176 | | |
177 | 177 | | |
178 | 178 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
233 | 233 | | |
234 | 234 | | |
235 | 235 | | |
236 | | - | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
237 | 243 | | |
238 | 244 | | |
239 | 245 | | |
| |||
742 | 748 | | |
743 | 749 | | |
744 | 750 | | |
745 | | - | |
| 751 | + | |
746 | 752 | | |
747 | | - | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
748 | 756 | | |
749 | 757 | | |
750 | 758 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2923 | 2923 | | |
2924 | 2924 | | |
2925 | 2925 | | |
2926 | | - | |
2927 | 2926 | | |
2928 | 2927 | | |
| 2928 | + | |
| 2929 | + | |
2929 | 2930 | | |
2930 | 2931 | | |
2931 | 2932 | | |
| |||
0 commit comments