[Fix] Remove unnecessary NPU synchronization in MTP proposer (#4325)

yiz-liu · web-flow · commit 97999347c8e8 · 2025-11-24T14:07:10.000+08:00
### What this PR does / why we need it? Remove unnecessary NPU synchronization in MTP proposer to improve performances. Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations. A more proper one is already implemented in #4233 ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
diff --git a/vllm_ascend/spec_decode/mtp_proposer.py b/vllm_ascend/spec_decode/mtp_proposer.py
@@ -886,7 +886,6 @@ def _propose(
                 attn_metadata_i.decode.max_seq_lens = min(
                     attn_metadata_i.decode.max_seq_lens,
                     self.runner.model_config.max_model_len)
-            torch.npu.synchronize()
 
         # mtp>1: [batch_size, k]
         draft_token_ids = torch.stack(draft_token_ids_list, dim=1)