Skip to content

Commit 684f254

Browse files
Prefer FlashAttention MLA as default over FlashMLA (vllm-project#27363)
Signed-off-by: Matthew Bonanni <[email protected]>
1 parent e553424 commit 684f254

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm/platforms/cuda.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,15 @@ def _get_backend_priorities(
5555
return [
5656
AttentionBackendEnum.CUTLASS_MLA,
5757
AttentionBackendEnum.FLASHINFER_MLA,
58-
AttentionBackendEnum.FLASHMLA,
5958
AttentionBackendEnum.FLASH_ATTN_MLA,
59+
AttentionBackendEnum.FLASHMLA,
6060
AttentionBackendEnum.TRITON_MLA,
6161
AttentionBackendEnum.FLASHMLA_SPARSE,
6262
]
6363
else:
6464
return [
65-
AttentionBackendEnum.FLASHMLA,
6665
AttentionBackendEnum.FLASH_ATTN_MLA,
66+
AttentionBackendEnum.FLASHMLA,
6767
AttentionBackendEnum.FLASHINFER_MLA,
6868
AttentionBackendEnum.TRITON_MLA,
6969
AttentionBackendEnum.FLASHMLA_SPARSE,

0 commit comments

Comments
 (0)