[WebGPU] Implement Split-K on GEMM #26751
Open
+345
−43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This patch implements the
Split-Koptimization onGEMM.GEMMinMatMulFillBiasOrZeroBeforeSplitKProgram.We need to add
betaas a new uniform value and all the parameters that are used to handle all the cases ofGEMMinMatMulWriteFnSource().Split-KinGemmProgram::GenerateShaderCode().Motivation and Context
With this PR we can achieve about 30% improvement in
florence-2-base-vision-encoder-fp16and 10% improvement indetr-resnet-50-fp16.