Commit 06f6cc1
[Bugfix]Fix the hang issue of multimodal model when running with DP>1 (#4392)
### What this PR does / why we need it?
When cudagraph_mode is set to FULL_DECODE_ONLY, if dp > 1, the dummy-run
process will be triggered. When calling the update_attn_params function,
the num_tokens parameter needs to be passed, and this value is obtained
through positions.shape[0]. However, the multimodal model uses mRope
(multi-dimensional rotary positional embeddings), which causes the shape
of positions to be 2. As a result, the value obtained from
positions.shape[0] is incorrect. We solve this problem by replacing
positions.shape[0] with num_tokens.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b
---------
Signed-off-by: wujinyuan1 <[email protected]>
Co-authored-by: wujinyuan1 <[email protected]>1 parent 84eae97 commit 06f6cc1
1 file changed
+2
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2810 | 2810 | | |
2811 | 2811 | | |
2812 | 2812 | | |
2813 | | - | |
2814 | | - | |
| 2813 | + | |
2815 | 2814 | | |
2816 | 2815 | | |
2817 | 2816 | | |
2818 | 2817 | | |
2819 | 2818 | | |
2820 | 2819 | | |
2821 | 2820 | | |
2822 | | - | |
| 2821 | + | |
2823 | 2822 | | |
2824 | 2823 | | |
2825 | 2824 | | |
| |||
0 commit comments