-
Notifications
You must be signed in to change notification settings - Fork 50
Fix GemmVariantsNonPowerOfTwoTileSize LIT tests #2149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a Nightly CI compilation crash that occurred when running fp8 GEMM tests on Navi3X (gfx11) architectures. Since Navi3X doesn't support WMMA acceleration for fp8 types, the tests were failing when using the V4 accel perf_config format. The fix splits fp8 tests into a separate file with architecture filtering to exclude gfx11 while maintaining test coverage for supported architectures.
Key changes:
- Removed
fp8_fp8from the originalGemmVariantsNonPowerOfTwoTileSize.tomldata types - Created new dedicated fp8 test files (
GemmVariantsNonPowerOfTwoTileSizeFp8.tomland.cfg) with gfx11 filtering - Added architecture support checks to both test configurations to ensure MFMA or WMMA support
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.toml |
New test configuration file for fp8-specific GEMM tests with non-power-of-two tile sizes |
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSizeFp8.cfg |
Configuration file that filters out Navi3X/gfx11 and ensures MFMA/WMMA support for fp8 tests |
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSize.toml |
Removed fp8_fp8 from data types to prevent crashes on unsupported architectures |
mlir/test/e2e/GemmVariantsNonPowerOfTwoTileSize.cfg |
Added architecture support checks for MFMA/WMMA to ensure tests run only on capable hardware |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2498231 to
c0d8e59
Compare
|
|
||
| [[axis]] | ||
| name = "t" | ||
| values = ["f16", "bf16", "i8"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bf16 is removed here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are seeing failures in the nightly CI runs because of the bf16 tests: https://ml-ci-internal.amd.com/blue/organizations/jenkins/MLIR%2Fmlir-nightly-all/detail/mlir-nightly-all/2219/pipeline/628/.
I was checking to see if removing that solved the problem but that does not seem to be the case. I also see failures on the i8 and f16 variants of the test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was with the causal tests (one of the test suites in AttentionNonPowerOfTwoTileSize). I've re-enabled bf16, and disabled the faulty test suite since it was not a regression and just a problem with a new test that was added. I opened up https://github.com/ROCm/rocMLIR-internal/issues/2173 to address the faulty test suite.
e49018d to
4bf9e1c
Compare
Motivation
This PR fixes an issue that was present in the Nightly CI runs after #2121 went in.
This fixes: https://github.com/ROCm/rocMLIR-internal/issues/2164
Technical Details
Navi3X does not have WMMA (accel) support for fp8 types (it gets filtered out in AmdArchDb), so when this test (GemmVariantsNonPowerOfTwoTileSize) was running fp8 with the V4 accel perf_config strings we were getting compilation crashes.
I moved the FP8 tests into a separate file that filters out Navi3X to work around this and not lose the FP8 coverage that we want.
Test Plan
Test Result
Submission Checklist