Skip to content

[RFC]: Refactor Attention module #4629

@weijinqian0

Description

@weijinqian0

Reason

  1. The Attention section has a large amount of code and many branches.
  2. The functions related to Cp differ significantly from those of normal Attention, but the coupling is quite severe.
  3. There are many masks due to historical reasons.

Attention Steps

  1. Remove the attention branch -- done [Refactor] Remove redundant attention operator branches. #4531
  2. Isolate PCP and DCP [Refactor] 1/N Refactor attention_v1 & extract attention_cp #4628
    (1) Forward class extraction
    (2) Metadata coupling processing
    (3) Builder processing
  3. Unify masks, split masks, and delete all other masks (MLA 50%) @chenjunyi
  4. Metadata processing
    (1) model_runner_v1 @chenjunyi
    (2) Coordinate with model_runner_v2, remove unused and mergeable elements.
  5. Delete attn_state. @chenjunyi
  6. CP attention, abstract parent class, unify MLA and attention, this is possibly dependent on FLA operator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For Comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions