Commit 0fb1dc4
[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill (#4770)
### What this PR does / why we need it?
The pad `-1` modification is from
vllm-project/vllm#25743.
It still has bugs for batched chunked prefill.
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e
Signed-off-by: drslark <[email protected]>
Co-authored-by: Mengqing Cao <[email protected]>1 parent 490ddf5 commit 0fb1dc4
File tree
8 files changed
+646
-28
lines changed- tests/e2e/multicard
- vllm_ascend
- ops/triton/mamba
- patch
- worker
- spec_decode
- worker
8 files changed
+646
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
28 | 27 | | |
29 | 28 | | |
30 | 29 | | |
| |||
64 | 63 | | |
65 | 64 | | |
66 | 65 | | |
67 | | - | |
| 66 | + | |
68 | 67 | | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
| 68 | + | |
75 | 69 | | |
76 | 70 | | |
77 | 71 | | |
| |||
115 | 109 | | |
116 | 110 | | |
117 | 111 | | |
118 | | - | |
119 | 112 | | |
120 | 113 | | |
121 | 114 | | |
| |||
0 commit comments