Skip to content

Conversation

@justinrosner
Copy link
Contributor

@justinrosner justinrosner commented Dec 4, 2025

Motivation

This PR fixes an issue that was present in the Nightly CI runs after #2121 went in.

This fixes: https://github.com/ROCm/rocMLIR-internal/issues/2164

Technical Details

Navi3X does not have WMMA (accel) support for fp8 types (it gets filtered out in AmdArchDb), so when this test (GemmVariantsNonPowerOfTwoTileSize) was running fp8 with the V4 accel perf_config strings we were getting compilation crashes.

I moved the FP8 tests into a separate file that filters out Navi3X to work around this and not lose the FP8 coverage that we want.

Test Plan

  • Nightly CI

Test Result

  • Nightly CI

Submission Checklist

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a Nightly CI compilation crash that occurred when running fp8 GEMM tests on Navi3X (gfx11) architectures. Since Navi3X doesn't support WMMA acceleration for fp8 types, the tests were failing when using the V4 accel perf_config format. The fix splits fp8 tests into a separate file with architecture filtering to exclude gfx11 while maintaining test coverage for supported architectures.

Key changes:

  • Removed fp8_fp8 from the original GemmVariantsNonPowerOfTwoTileSize.toml data types
  • Created new dedicated fp8 test files (GemmVariantsNonPowerOfTwoTileSizeFp8.toml and .cfg) with gfx11 filtering
  • Added architecture support checks to both test configurations to ensure MFMA or WMMA support

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.toml New test configuration file for fp8-specific GEMM tests with non-power-of-two tile sizes
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.cfg Configuration file that filters out Navi3X/gfx11 and ensures MFMA/WMMA support for fp8 tests
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSize.toml Removed fp8_fp8 from data types to prevent crashes on unsupported architectures
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSize.cfg Added architecture support checks for MFMA/WMMA to ensure tests run only on capable hardware

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


[[axis]]
name = "t"
values = ["f16", "bf16", "i8"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bf16 is removed here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are seeing failures in the nightly CI runs because of the bf16 tests: https://ml-ci-internal.amd.com/blue/organizations/jenkins/MLIR%2Fmlir-nightly-all/detail/mlir-nightly-all/2219/pipeline/628/.

I was checking to see if removing that solved the problem but that does not seem to be the case. I also see failures on the i8 and f16 variants of the test as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was with the causal tests (one of the test suites in AttentionNonPowerOfTwoTileSize). I've re-enabled bf16, and disabled the faulty test suite since it was not a regression and just a problem with a new test that was added. I opened up https://github.com/ROCm/rocMLIR-internal/issues/2173 to address the faulty test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants