Commit 749b183
authored
[ModelRunner][Refactor] Refactor kv cache tensor initialization logic (vllm-project#3106)
### What this PR does / why we need it?
Refactor kv cache tensor initialization logic.
1. Unify the kvcache tensor initialization logic of deepseek and normal
models
2. spilt `initialize_kv_cache_tensors` into `_allocate_kv_cache_tensors`
and `_reshape_kv_cache_tensors`, following gpu modelrunner in vllm
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed with existing test.
1. prefill disaggregation scenario
4. deepseek + aclgraph/eager mode
5. qwen3 next
- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b
---------
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: luolun <[email protected]>1 parent 239774b commit 749b183
1 file changed
+146
-222
lines changed
0 commit comments