Skip to content

Commit a5d4f88

Browse files
committed
refactor kvcache allocate and reshape
* Unify the kvcache tensor initialization logic of deepseek and normal models * spilt initialize_kv_cache_tensors into _allocate_kv_cache_tensors and _reshape_kv_cache_tensors, following gpu modelrunner in vllm * Fix the shared_by logic so that the same attention spec could share the same buffer instead of allocating more hbm. Signed-off-by: MengqingCao <[email protected]>
1 parent 1f25d60 commit a5d4f88

File tree

1 file changed

+145
-235
lines changed

1 file changed

+145
-235
lines changed

0 commit comments

Comments
 (0)