Commit 0686b32
[Fix] Fixes issues in MTP with async scheduling and ACL graph (#4963)
### What this PR does / why we need it?
Corrects attention metadata size for MTP when both asynchronous
scheduling and full ACL graph mode are enabled. This prevents potential
size mismatches during execution.
Additionally, improves the robustness of calculating token sample
indices by explicitly aligning tensor shapes.
Finally, prevents padding when the number of input tokens exceeds the
maximum ACL graph batch size to avoid out-of-bounds errors.
### Does this PR introduce _any_ user-facing change?
None.
### How was this patch tested?
Need to add corresponding test case ASAP.
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e
---------
Signed-off-by: Yizhou Liu <[email protected]>
Signed-off-by: Yizhou <[email protected]>
Co-authored-by: Jade Zheng <[email protected]>1 parent 42ceaf0 commit 0686b32
File tree
2 files changed
+16
-3
lines changed- vllm_ascend
- spec_decode
- worker
2 files changed
+16
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
748 | 748 | | |
749 | 749 | | |
750 | 750 | | |
| 751 | + | |
751 | 752 | | |
752 | 753 | | |
753 | 754 | | |
| |||
802 | 803 | | |
803 | 804 | | |
804 | 805 | | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
805 | 817 | | |
806 | 818 | | |
807 | 819 | | |
| |||
1133 | 1145 | | |
1134 | 1146 | | |
1135 | 1147 | | |
1136 | | - | |
1137 | | - | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
1138 | 1151 | | |
1139 | 1152 | | |
1140 | 1153 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1019 | 1019 | | |
1020 | 1020 | | |
1021 | 1021 | | |
1022 | | - | |
| 1022 | + | |
1023 | 1023 | | |
1024 | 1024 | | |
1025 | 1025 | | |
| |||
0 commit comments