Skip to content

v0.2.2

Choose a tag to compare

@yzh119 yzh119 released this 23 Feb 22:28
· 804 commits to main since this release

What's Changed

  • fix cu121 torch2.6 by @zhyncs in #867
  • unittest: add MLA test cases where kv_len is evenly divided by page_size. by @foreverlms in #861
  • bugfix: fix the behavior of MLA kernel when kv-length is 0 by @yzh119 in #868
  • Merge of previous PRs for typos in a single one. As per your request. by @didier-durand in #862
  • add lightllm adoption by @zhyncs in #871
  • fix geneate_dispatch_inc args from parser by @baowendin in #870
  • [API] Fix top_k_top_p_sampling_from_logits param typo by @kasohrab in #875
  • misc:Remove unused k_smem_offset_w update in MLA kernel by @muoshuosha in #878
  • JIT compilation support for TVM by @MasterJH5574 in #880
  • [Hotfix] Add flashinfer.jit.attention into packages by @zhouye in #881
  • perf: FlashAttention-3 style MLA PageAttention by @yzh119 in #887
  • [JIT] Fix MLA header in TVM binding by @MasterJH5574 in #889
  • Fixing several typos in doc file kv_layout.rst by @didier-durand in #884
  • unittest: add unittests for MLA + cudagraph by @yzh119 in #890

New Contributors

  • @baowendin made their first contribution in #870
  • @kasohrab made their first contribution in #875
  • @zhouye made their first contribution in #881

Full Changelog: v0.2.1.post2...v0.2.2