Commit 649f329
[HybridKV][Bugfix] Fix Hybrid kvcache sharing bug in same attention type (vllm-project#3760)
### What this PR does / why we need it?
Part of vllm-project#3106
Fix Hybrid kvcache sharing bug in same attention type
Change the `shared_by` logic so that the same attention spec could share
the same buffer instead of allocating more hbm.
After this pr, kvcache memory saved 50% in qwen3-next compared with
before (`self_attn:linear_attn=1:3` in an `attn_group`), and
`gpu_memory_utilization` could increase to `0.8` on Qwen3-Next when
running on A2 64G/card with tp4
<img width="2833" height="1540" alt="image"
src="https://github.com/user-attachments/assets/2a91fa99-fb0f-447c-9e8b-acd587890fbe"
/>
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Test pass with the latest e2e test case on qwen3-next
- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0
---------
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: hwhaokun <[email protected]>1 parent cff0e35 commit 649f329
File tree
2 files changed
+26
-20
lines changed- tests/e2e/multicard
- vllm_ascend/worker
2 files changed
+26
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3225 | 3225 | | |
3226 | 3226 | | |
3227 | 3227 | | |
3228 | | - | |
| 3228 | + | |
| 3229 | + | |
3229 | 3230 | | |
| 3231 | + | |
| 3232 | + | |
| 3233 | + | |
| 3234 | + | |
| 3235 | + | |
| 3236 | + | |
| 3237 | + | |
| 3238 | + | |
| 3239 | + | |
| 3240 | + | |
| 3241 | + | |
3230 | 3242 | | |
3231 | | - | |
3232 | | - | |
3233 | | - | |
3234 | | - | |
3235 | | - | |
3236 | | - | |
3237 | | - | |
3238 | | - | |
3239 | | - | |
3240 | | - | |
3241 | | - | |
3242 | | - | |
3243 | | - | |
3244 | | - | |
3245 | | - | |
3246 | | - | |
| 3243 | + | |
| 3244 | + | |
| 3245 | + | |
| 3246 | + | |
| 3247 | + | |
3247 | 3248 | | |
3248 | 3249 | | |
3249 | 3250 | | |
| |||
3265 | 3266 | | |
3266 | 3267 | | |
3267 | 3268 | | |
3268 | | - | |
| 3269 | + | |
| 3270 | + | |
| 3271 | + | |
| 3272 | + | |
| 3273 | + | |
| 3274 | + | |
3269 | 3275 | | |
3270 | 3276 | | |
3271 | 3277 | | |
| |||
0 commit comments