You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #5080
X-link: https://github.com/facebookresearch/FBGEMM/pull/2087
This PR introduces optimization for `group_index_select_or_add_2d_kernel` (`USE_INDEX_SELECT==true`) kernel with primary focus on `float` type and relatively small embedding dimensions. 2 things are implemented:
1) Extracted the common variables out of the loop to omit unnecessary synchronizations on memory load (compiler won't do that automatically)
2) Switch to 32 threads logical wave sizes to reduce granularity losses.
Pull Request resolved: #5078
Reviewed By: spcyppt, haoyuz
Differential Revision: D86135611
Pulled By: q10
fbshipit-source-id: f4fb9966f5f5180c4dde2aed92ca726c260b7743
0 commit comments