Skip to content

Commit 7dd7de3

Browse files
Hang Qufacebook-github-bot
authored andcommitted
Update embedding_forward_quantized_cpu_template.cpp to use initialized output memory instead of uninitialized (#5054)
Summary: X-link: facebookresearch/FBGEMM#2064 We observe, if the memory of output is uninitialized, the output may be garbage. This is because certain memory is untouched. The proposed fix is a quick workaround, but it will be more efficient to directly fill the untouched memory with zero. Reviewed By: sryap Differential Revision: D85447298
1 parent 34754ea commit 7dd7de3

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

fbgemm_gpu/codegen/inference/embedding_forward_quantized_cpu_template.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,7 @@ Tensor int_nbit_split_embedding{{ "_nobag" if nobag else "" }}_codegen_forward_{
210210
total_adjusted_D += T * kINT8QparamsBytes;
211211
}
212212
output = at::empty({B, total_adjusted_D}, dev_weights.options().dtype(getScalarType(o_dtype)).pinned_memory(pinned_memory));
213+
output.fill_(0);
213214
{% else %}
214215
constexpr int kINT8QparamsBytes = 4; // no bag int8 output aligns with fbgemm weights storage size and layout
215216
constexpr int kINT4QparamsElems = 8; // scale + bias takes 4 bytes which are 8 int4 elements
@@ -220,6 +221,7 @@ Tensor int_nbit_split_embedding{{ "_nobag" if nobag else "" }}_codegen_forward_{
220221
adjusted_D += kINT4QparamsElems;
221222
}
222223
output = at::empty({total_L, adjusted_D}, dev_weights.options().dtype(getScalarType(o_dtype)).pinned_memory(pinned_memory));
224+
output.fill_(0);
223225

224226
{% endif %}
225227

0 commit comments

Comments
 (0)