Skip to content

Kernel 10 Warp Tiling: the current constraints are not enough #22

@bssrdf

Description

@bssrdf

I think for kernel 10 (warp tiling), additional constraints for block parameters may be needed:

  const uint K10_NUM_THREADS = 128;
  const uint K10_BN = 256;
  const uint K10_BM = 128;
  const uint K10_BK = 8;
  const uint K10_WN = 256;
  const uint K10_WM = 32;
  const uint K10_WNITER = 1;
  const uint K10_TN = 4;
  const uint K10_TM = 8;

The above combination does not cause compiler error but failed at run time with matrix size 256.

changing to

  const uint K10_TN = 8;
  const uint K10_TM = 4;

the kernel worked.

I think the constraint (K10_WM / K10_WMITER) % TM == 0 && (K10_WN / K10_WNITER) % TN == 0 is needed unless the code changes to accommodate the situation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions