Commit d08401d
authored
[Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834)
### What this PR does / why we need it?
The current MatmulReduceScatter operator experiences performance
degradation in small-shape scenarios, so it determines whether to use
this operator by judging the size of the shape.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@releases/v0.11.1
---------
Signed-off-by: ZYang6263 <[email protected]>1 parent 90ae114 commit d08401d
2 files changed
+13
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
| 120 | + | |
119 | 121 | | |
120 | 122 | | |
121 | 123 | | |
122 | 124 | | |
| 125 | + | |
123 | 126 | | |
124 | 127 | | |
125 | 128 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
| 385 | + | |
385 | 386 | | |
386 | 387 | | |
| 388 | + | |
387 | 389 | | |
388 | 390 | | |
389 | 391 | | |
| |||
409 | 411 | | |
410 | 412 | | |
411 | 413 | | |
412 | | - | |
413 | | - | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
414 | 417 | | |
415 | 418 | | |
416 | 419 | | |
| |||
423 | 426 | | |
424 | 427 | | |
425 | 428 | | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
430 | 434 | | |
431 | 435 | | |
432 | 436 | | |
| |||
0 commit comments