Skip to content

Commit a078916

Browse files
committed
fix
1 parent 6b27fe2 commit a078916

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@ Status GroupQueryAttention::ComputeInternal(onnxruntime::webgpu::ComputeContext&
292292

293293
if (parameters.is_packed_qkv_ && do_rotary_) {
294294
// Use the ultimate fused operation when FlashAttention and static KV cache is enabled.
295-
if (will_use_flash_attention && !parameters.past_present_share_buffer_) {
295+
if (will_use_flash_attention && parameters.past_present_share_buffer_) {
296296
// Directly call ApplyFlashAttention with fused split/rotary/copyKV enabled
297297
// query points to packed QKV, K and V are nullptr since they're not needed
298298
return ApplyFlashAttention(query, nullptr, nullptr, attention_bias, output, past_key, present_key, past_value,

0 commit comments

Comments
 (0)