v0.2.2

yzh119 released this 23 Feb 22:28

· 804 commits to main since this release

986e5b1

What's Changed

fix cu121 torch2.6 by @zhyncs in #867
unittest: add MLA test cases where kv_len is evenly divided by page_size. by @foreverlms in #861
bugfix: fix the behavior of MLA kernel when kv-length is 0 by @yzh119 in #868
Merge of previous PRs for typos in a single one. As per your request. by @didier-durand in #862
add lightllm adoption by @zhyncs in #871
fix geneate_dispatch_inc args from parser by @baowendin in #870
[API] Fix top_k_top_p_sampling_from_logits param typo by @kasohrab in #875
misc:Remove unused k_smem_offset_w update in MLA kernel by @muoshuosha in #878
JIT compilation support for TVM by @MasterJH5574 in #880
[Hotfix] Add flashinfer.jit.attention into packages by @zhouye in #881
perf: FlashAttention-3 style MLA PageAttention by @yzh119 in #887
[JIT] Fix MLA header in TVM binding by @MasterJH5574 in #889
Fixing several typos in doc file kv_layout.rst by @didier-durand in #884
unittest: add unittests for MLA + cudagraph by @yzh119 in #890

New Contributors

@baowendin made their first contribution in #870
@kasohrab made their first contribution in #875
@zhouye made their first contribution in #881

Full Changelog: v0.2.1.post2...v0.2.2

Contributors

didier-durand, zhouye, and 7 other contributors

Assets 10