Skip to content

Commit ecf2ac9

Browse files
Aya-ZIbrameta-codesync[bot]
authored andcommitted
General adoption for Mtile = 64 (pytorch#5075)
Summary: Pull Request resolved: pytorch#5075 X-link: https://github.com/facebookresearch/FBGEMM/pull/2080 This diff generalizes the work in (D85155388) based on Gefei's diff D85631781 . Compared to D85631781, we avoid registers warp shuffling by using 32b TMEM atoms. This diff supports: 1. Different dtypes (fp8, bf16) 2. Different mtiles (128, 64) Reviewed By: v0i0 Differential Revision: D85893883 fbshipit-source-id: 25e93e627c573a120ab46336d3f234064c5ae066
1 parent 391f78d commit ecf2ac9

File tree

2 files changed

+193
-104
lines changed

2 files changed

+193
-104
lines changed

0 commit comments

Comments
 (0)