Skip to content

对于Sdpa 与Flash Attention显存计算问题 #52

@Couteaux123

Description

@Couteaux123

我注意到在pyramidkv_utils.py中首先计算了一遍attention,然后进行indices的选择,然而在取代的attention计算中,又由于计算了一遍attention,反而这样的显存占用会比fullkv高,请问这个能改进嘛?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions