Commit a5d4f88

committed

refactor kvcache allocate and reshape

* Unify the kvcache tensor initialization logic of deepseek and normal models * spilt initialize_kv_cache_tensors into _allocate_kv_cache_tensors and _reshape_kv_cache_tensors, following gpu modelrunner in vllm * Fix the shared_by logic so that the same attention spec could share the same buffer instead of allocating more hbm. Signed-off-by: MengqingCao <[email protected]>

1 parent 1f25d60 commit a5d4f88Copy full SHA for a5d4f88

1 file changed

+145

-235

lines changed

vllm_ascend/worker
- model_runner_v1.py

1 file changed

+145

-235

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit a5d4f88

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments