Skip to content

Commit 7d5b507

Browse files
author
weijinqian_v1
committed
[Refactor] add fia_v3 attention & remove other attention operator.
Signed-off-by: weijinqian_v1 <[email protected]>
1 parent 994d6d8 commit 7d5b507

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

vllm_ascend/attention/attention_v1.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -540,16 +540,17 @@ def forward(
540540
value=value,
541541
output=output,
542542
layer_name=layer.layer_name)
543-
return output.view(num_tokens, self.hidden_size)
543+
return output
544544

545545
if attn_metadata is None:
546546
return output.fill_(0)
547547

548548
if hasattr(layer, 'quant_method') and use_kv_cache_int8:
549-
output = layer.quant_method.apply(layer, query, key, value,
549+
attn_output = layer.quant_method.apply(layer, query, key, value,
550550
kv_cache, attn_metadata,
551551
self.attn_type, self.scale,
552552
output)
553+
output[:num_tokens] = attn_output[:num_tokens]
553554
return output
554555

555556
# View q k v to BSH.

0 commit comments

Comments
 (0)