Skip to content

Commit e449ca5

Browse files
Hang Qufacebook-github-bot
authored andcommitted
Update embedding_forward_quantized_cpu_template.cpp to use initialized output memory instead of uninitialized (#5054)
Summary: X-link: facebookresearch/FBGEMM#2064 We observe, if the memory of output is uninitialized, the output may be garbage. This is because certain memory is untouched. The proposed fix is a quick workaround, but it will be more efficient to directly fill the untouched memory with zero. Reviewed By: sryap Differential Revision: D85447298
1 parent f849dcd commit e449ca5

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

fbgemm_gpu/codegen/inference/embedding_forward_quantized_cpu_template.cpp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,9 @@ Tensor int_nbit_split_embedding{{ "_nobag" if nobag else "" }}_codegen_forward_{
210210
total_adjusted_D += T * kINT8QparamsBytes;
211211
}
212212
output = at::empty({B, total_adjusted_D}, dev_weights.options().dtype(getScalarType(o_dtype)).pinned_memory(pinned_memory));
213+
if (!output_is_int4 && !output_is_int8) {
214+
output.fill_(0);
215+
}
213216
{% else %}
214217
constexpr int kINT8QparamsBytes = 4; // no bag int8 output aligns with fbgemm weights storage size and layout
215218
constexpr int kINT4QparamsElems = 8; // scale + bias takes 4 bytes which are 8 int4 elements
@@ -220,7 +223,9 @@ Tensor int_nbit_split_embedding{{ "_nobag" if nobag else "" }}_codegen_forward_{
220223
adjusted_D += kINT4QparamsElems;
221224
}
222225
output = at::empty({total_L, adjusted_D}, dev_weights.options().dtype(getScalarType(o_dtype)).pinned_memory(pinned_memory));
223-
226+
if (!output_is_int4 && !output_is_int8) {
227+
output.fill_(0);
228+
}
224229
{% endif %}
225230

226231

0 commit comments

Comments
 (0)