Commit a47aa4d
authored
[feat] apply flashcomm1 on bailing (#4868)
### What this PR does / why we need it?
This PR adjusts the layer prefix matching rules for tensor parallelism
(column/row parallel ops) to fit Bailing model's naming conventions
(adding "query_key_value" for column parallel and "attention.dense" for
row parallel), enabling flashcomm1 to work properly on the Bailing
model.
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e
Signed-off-by: hwhaokun <[email protected]>1 parent 2f965d8 commit a47aa4d
1 file changed
+19
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
610 | 610 | | |
611 | 611 | | |
612 | 612 | | |
613 | | - | |
614 | | - | |
615 | | - | |
616 | | - | |
617 | | - | |
618 | | - | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
619 | 623 | | |
620 | 624 | | |
621 | 625 | | |
| |||
637 | 641 | | |
638 | 642 | | |
639 | 643 | | |
640 | | - | |
641 | | - | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
642 | 653 | | |
643 | 654 | | |
644 | 655 | | |
| |||
0 commit comments