Commit a5d4f88
committed
refactor kvcache allocate and reshape
* Unify the kvcache tensor initialization logic of deepseek and normal models
* spilt initialize_kv_cache_tensors into _allocate_kv_cache_tensors and _reshape_kv_cache_tensors, following gpu modelrunner in vllm
* Fix the shared_by logic so that the same attention spec could share the same buffer instead of allocating more hbm.
Signed-off-by: MengqingCao <[email protected]>1 parent 1f25d60 commit a5d4f88
1 file changed
+145
-235
lines changed
0 commit comments