-
Notifications
You must be signed in to change notification settings - Fork 624
Open
Labels
RFCRequest For CommentsRequest For Comments
Description
Reason
- The Attention section has a large amount of code and many branches.
- The functions related to Cp differ significantly from those of normal Attention, but the coupling is quite severe.
- There are many masks due to historical reasons.
Attention Steps
- Remove the attention branch -- done [Refactor] Remove redundant attention operator branches. #4531
- Isolate PCP and DCP [Refactor] 1/N Refactor attention_v1 & extract attention_cp #4628
(1) Forward class extraction
(2) Metadata coupling processing
(3) Builder processing - Unify masks, split masks, and delete all other masks (MLA 50%) @chenjunyi
- Metadata processing
(1) model_runner_v1 @chenjunyi
(2) Coordinate with model_runner_v2, remove unused and mergeable elements. - Delete attn_state. @chenjunyi
- CP attention, abstract parent class, unify MLA and attention, this is possibly dependent on FLA operator.
wangxiyuan and MengqingCao
Metadata
Metadata
Assignees
Labels
RFCRequest For CommentsRequest For Comments