Skip to content

Commit 56330f9

Browse files
author
weijinqian_v1
committed
[Refactor] add fia_v3 attention & remove other attention operator.
Signed-off-by: weijinqian_v1 <[email protected]>
1 parent 959a630 commit 56330f9

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

vllm_ascend/attention/attention_v1.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -547,9 +547,9 @@ def forward(
547547

548548
if hasattr(layer, 'quant_method') and use_kv_cache_int8:
549549
attn_output = layer.quant_method.apply(layer, query, key, value,
550-
kv_cache, attn_metadata,
551-
self.attn_type, self.scale,
552-
output)
550+
kv_cache, attn_metadata,
551+
self.attn_type, self.scale,
552+
output)
553553
output[:num_tokens] = attn_output[:num_tokens]
554554
return output
555555

0 commit comments

Comments
 (0)