Commit 66dbb3e
[Bugfix]Fix the hang issue of multimodal model when running with DP>1 (#4393)
### What this PR does / why we need it?
When cudagraph_mode is set to FULL_DECODE_ONLY, if dp > 1, the dummy-run
process will be triggered. When calling the update_attn_params function,
the num_tokens parameter needs to be passed, and this value is obtained
through positions.shape[0]. However, the multimodal model uses mRope
(multi-dimensional rotary positional embeddings), which causes the shape
of positions to be 2. As a result, the value obtained from
positions.shape[0] is incorrect. We solve this problem by replacing
positions.shape[0] with num_tokens.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: wujinyuan1 <[email protected]>
Co-authored-by: wujinyuan1 <[email protected]>
Signed-off-by: 刘哲续 <[email protected]>1 parent 2b6d7b8 commit 66dbb3e
1 file changed
+2
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2322 | 2322 | | |
2323 | 2323 | | |
2324 | 2324 | | |
2325 | | - | |
2326 | | - | |
| 2325 | + | |
2327 | 2326 | | |
2328 | 2327 | | |
2329 | | - | |
| 2328 | + | |
2330 | 2329 | | |
2331 | 2330 | | |
2332 | 2331 | | |
| |||
0 commit comments