Fix GemmVariantsNonPowerOfTwoTileSize LIT tests #2149

justinrosner · 2025-12-04T00:49:50Z

Motivation

This PR fixes an issue that was present in the Nightly CI runs after #2121 went in.

This fixes: https://github.com/ROCm/rocMLIR-internal/issues/2164

Technical Details

Navi3X does not have WMMA (accel) support for fp8 types (it gets filtered out in AmdArchDb), so when this test (GemmVariantsNonPowerOfTwoTileSize) was running fp8 with the V4 accel perf_config strings we were getting compilation crashes.

I moved the FP8 tests into a separate file that filters out Navi3X to work around this and not lose the FP8 coverage that we want.

Test Plan

Nightly CI

Test Result

Nightly CI

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR fixes a Nightly CI compilation crash that occurred when running fp8 GEMM tests on Navi3X (gfx11) architectures. Since Navi3X doesn't support WMMA acceleration for fp8 types, the tests were failing when using the V4 accel perf_config format. The fix splits fp8 tests into a separate file with architecture filtering to exclude gfx11 while maintaining test coverage for supported architectures.

Key changes:

Removed fp8_fp8 from the original GemmVariantsNonPowerOfTwoTileSize.toml data types
Created new dedicated fp8 test files (GemmVariantsNonPowerOfTwoTileSizeFp8.toml and .cfg) with gfx11 filtering
Added architecture support checks to both test configurations to ensure MFMA or WMMA support

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.toml`	New test configuration file for fp8-specific GEMM tests with non-power-of-two tile sizes
`mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.cfg`	Configuration file that filters out Navi3X/gfx11 and ensures MFMA/WMMA support for fp8 tests
`mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSize.toml`	Removed fp8_fp8 from data types to prevent crashes on unsupported architectures
`mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSize.cfg`	Added architecture support checks for MFMA/WMMA to ensure tests run only on capable hardware

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.cfg

umangyadav · 2025-12-05T13:43:11Z

mlir/test/e2e/AttentionNonPowerOfTwoTileSize.toml


 [[axis]]
 name = "t"
-values = ["f16", "bf16", "i8"]


why bf16 is removed here ?

We are seeing failures in the nightly CI runs because of the bf16 tests: https://ml-ci-internal.amd.com/blue/organizations/jenkins/MLIR%2Fmlir-nightly-all/detail/mlir-nightly-all/2219/pipeline/628/.

I was checking to see if removing that solved the problem but that does not seem to be the case. I also see failures on the i8 and f16 variants of the test as well.

The issue was with the causal tests (one of the test suites in AttentionNonPowerOfTwoTileSize). I've re-enabled bf16, and disabled the faulty test suite since it was not a regression and just a problem with a new test that was added. I opened up https://github.com/ROCm/rocMLIR-internal/issues/2173 to address the faulty test suite.

justinrosner requested review from dhernandez0 and umangyadav December 4, 2025 00:49

justinrosner requested a review from causten as a code owner December 4, 2025 00:49

justinrosner requested review from Copilot and pabloantoniom December 4, 2025 00:49

Copilot started reviewing on behalf of justinrosner December 4, 2025 00:52 View session

Copilot finished reviewing on behalf of justinrosner December 4, 2025 00:54

Copilot AI reviewed Dec 4, 2025

View reviewed changes

dhernandez0 reviewed Dec 4, 2025

View reviewed changes

mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.cfg Outdated Show resolved Hide resolved

dhernandez0 approved these changes Dec 4, 2025

View reviewed changes

umangyadav approved these changes Dec 4, 2025

View reviewed changes

justinrosner force-pushed the 2164-navi3x-error branch from 2498231 to c0d8e59 Compare December 4, 2025 19:17

umangyadav reviewed Dec 5, 2025

View reviewed changes

justinrosner added 7 commits December 5, 2025 13:53

Fix LIT errors

5740ace

Remove whitespace

3fdcf7e

Add newlines

97d4b6d

Improve on filtering

9edeb18

Fix thresholds

ae25a8a

Remove flaky bf16 tests from I

ffd793f

RMS threshold

4bf9e1c

justinrosner force-pushed the 2164-navi3x-error branch from e49018d to 4bf9e1c Compare December 5, 2025 13:58

Disable causal tests

b3decb0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GemmVariantsNonPowerOfTwoTileSize LIT tests #2149

Fix GemmVariantsNonPowerOfTwoTileSize LIT tests #2149

justinrosner commented Dec 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

umangyadav Dec 5, 2025

Uh oh!

justinrosner Dec 5, 2025

Uh oh!

justinrosner Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix GemmVariantsNonPowerOfTwoTileSize LIT tests #2149

Are you sure you want to change the base?

Fix GemmVariantsNonPowerOfTwoTileSize LIT tests #2149

Conversation

justinrosner commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

umangyadav Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

justinrosner Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

justinrosner Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justinrosner commented Dec 4, 2025 •

edited

Loading