You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- .github/pull_request_template.md -->
## π Description
This MR adds trtllm-gen per-tensor sparseMla kernels.
## π Related Issues
<!-- Link any related issues here -->
## π Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### β Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## π§ͺ Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added Sparse MLA mode to enable top-k sparse attention paths and
configure sparse top-k behavior.
* **Performance**
* Improved kernel selection and runtime behavior to better support
sparse MLA and varied head dimensions.
* **Tests**
* Expanded tests for multiple head dimensions and added comprehensive
sparse MLA decoding tests and utilities.
* **Validation**
* Strengthened input/shape/runtime checks for sparse MLA configuration.
* **Chores**
* Updated public artifact references/checksums; tests now skip when
insufficient GPUs are available.
<sub>βοΈ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Perkz Zheng <[email protected]>
Co-authored-by: Zihao Ye <[email protected]>
Co-authored-by: yzh119 <[email protected]>
0 commit comments