Skip to content

Commit 9799934

Browse files
authored
[Fix] Remove unnecessary NPU synchronization in MTP proposer (#4325)
### What this PR does / why we need it? Remove unnecessary NPU synchronization in MTP proposer to improve performances. Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations. A more proper one is already implemented in #4233 ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: Yizhou Liu <[email protected]>
1 parent 8c87a3b commit 9799934

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

vllm_ascend/spec_decode/mtp_proposer.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -886,7 +886,6 @@ def _propose(
886886
attn_metadata_i.decode.max_seq_lens = min(
887887
attn_metadata_i.decode.max_seq_lens,
888888
self.runner.model_config.max_model_len)
889-
torch.npu.synchronize()
890889

891890
# mtp>1: [batch_size, k]
892891
draft_token_ids = torch.stack(draft_token_ids_list, dim=1)

0 commit comments

Comments
 (0)