Skip to content

Commit 8bd9477

Browse files
author
weijinqian_v1
committed
[Refactor] add fia_v3 attention & remove other attention operator.
Signed-off-by: weijinqian_v1 <[email protected]>
1 parent 5445fd7 commit 8bd9477

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -896,8 +896,11 @@ def _make_attention_mask(self, seq_lens, position,
896896
if self.model_config.runner_type == "pooling" and self.model_config.pooler_config.pooling_type == "CLS":
897897
return self.attn_mask_builder.get_pooling_mask(self.device)
898898
# fia prefill situation.
899-
if attn_state in [AscendAttentionState.PrefillNoCache, AscendAttentionState.PrefillCacheHit,
900-
AscendAttentionState.ChunkedPrefill]:
899+
if attn_state in [
900+
AscendAttentionState.PrefillNoCache,
901+
AscendAttentionState.PrefillCacheHit,
902+
AscendAttentionState.ChunkedPrefill
903+
]:
901904
return self.attn_mask_builder.get_splitfuse_attn_mask()
902905
# Decode-only situation.
903906
return None

0 commit comments

Comments
 (0)