We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 5bca5cc commit 696343eCopy full SHA for 696343e
ggml/src/ggml-opencl/kernels/rms_norm.cl
@@ -134,6 +134,11 @@ kernel void kernel_rms_norm_mul(
134
src1 = src1 + offset1;
135
dst = dst + offsetd;
136
137
+ // The size of sum is sizeof(float)*subgroup_size.
138
+ // Each subgroup writes its partial sum to this array.
139
+ // So the number of subgroups per workgroup for this kernel cannot exceed the subgroup size.
140
+ // This is generally true -
141
+ // for subgroup size 64, workgroup size should be less than 4096 (the max is usually 1024).
142
if (get_sub_group_id() == 0) {
143
sum[get_sub_group_local_id()] = 0.0f;
144
}
0 commit comments