[webgpu] Fused SplitPackedQKV with FusedQKRotaryEmbedding #26447

xiaofeihan1 · 2025-10-30T02:46:55Z

Description

When is_packed_qkv_ and do_rotary_, call a new SplitPackedQKVWithRotaryEmbedding which fused SplitPackedQKV with FusedQKRotaryEmbedding.

Dispatch size is BSN*work_per_head. (work_per_head is head_size - half_rotary_embedding_dim, is equal half_rotary_embedding_dim + need_copy_dim)

For half_rotary_embedding_dim, we split packedQKV and then do rotary for pairs q/k and directly store v.
For need_copy_dim, we split packedQKV and then directly store q/k/v

Motivation and Context

On NV5080, the token generation speed improve ~3%.

generation tps	Before	After
NV5080	129	133
Intel	15.4	15.5
Mac	69.0	71.0

onnxruntime/contrib_ops/webgpu/bert/split_packed_qkv_with_rotary_embedding.wgsl.template

implement

13dea3d

xiaofeihan1 added the ep:WebGPU ort-web webgpu provider label Oct 30, 2025

xiaofeihan1 requested a review from qjia7 October 30, 2025 05:10

xiaofeihan1 added 2 commits October 31, 2025 17:37

template

c214376

move to headers

335e977

qjia7 reviewed Nov 6, 2025

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/split_packed_qkv_with_rotary_embedding.wgsl.template Show resolved Hide resolved

resolve comments

e4824a7

qjia7 previously approved these changes Nov 7, 2025

View reviewed changes

qjia7 requested review from fs-eire and guschmue November 7, 2025 09:53

move impl to cc

5b1253a

xiaofeihan1 dismissed qjia7’s stale review via 5b1253a November 10, 2025 06:35

xiaofeihan1 requested a review from qjia7 November 10, 2025 06:36

qjia7 approved these changes Nov 10, 2025

View reviewed changes

guschmue approved these changes Nov 10, 2025

View reviewed changes

xiaofeihan1 merged commit cf8476b into microsoft:main Nov 11, 2025
91 of 92 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webgpu] Fused SplitPackedQKV with FusedQKRotaryEmbedding #26447

[webgpu] Fused SplitPackedQKV with FusedQKRotaryEmbedding #26447

Uh oh!

xiaofeihan1 commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[webgpu] Fused SplitPackedQKV with FusedQKRotaryEmbedding #26447

[webgpu] Fused SplitPackedQKV with FusedQKRotaryEmbedding #26447

Uh oh!

Conversation

xiaofeihan1 commented Oct 30, 2025

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants