Skip to content

Releases: flashinfer-ai/flashinfer

Nightly Release v0.5.1-20251106

06 Nov 03:55
747b4e2

Choose a tag to compare

Pre-release

Automated nightly build for version 0.5.1 (dev20251106)

Nightly Release v0.5.1-20251105

05 Nov 03:54
9bc5bd5

Choose a tag to compare

Pre-release

Automated nightly build for version 0.5.1 (dev20251105)

Release v0.5.1

04 Nov 05:43
2d68a6b

Choose a tag to compare

What's Changed

  • test: Enable xfailed trtllm decode long seqlen tests and update microbenchmark by @bkryu in #2018
  • Updated decorator to support unspecified default by @nvmbreughe in #2026
  • release: Bump version for v0.5.1 release by @bkryu in #2031

Full Changelog: v0.5.0...v0.5.1

Nightly Release v0.5.0-20251104

04 Nov 03:53
2d68a6b

Choose a tag to compare

Pre-release

Automated nightly build for version 0.5.0 (dev20251104)

Release v0.5.0

02 Nov 05:52
5854494

Choose a tag to compare

What's Changed

  • fix vllm graph register and add test by @NVShreyas in #1894
  • Support checks PoC by @nvmbreughe in #1809
  • chore: Restore FLASHINFER_LOCAL_VERSION environment variable by @yzh119 in #1934
  • chore: fix wheel license packaging issues by @yzh119 in #1932
  • Add layernorm op for inputs of mixed dtype by @akhilg-nv in #1926
  • fixbug: fix devcontainer context by @cyx-6 in #1938
  • MLA RoPE + quantization fused kernel: shape generalization for MHA / GQA by @kahyunnam in #1924
  • ci: limit max_jobs for arm64 jit wheel cache build on CI by @yzh119 in #1943
  • Add realistic bench for persistent kernel by @Edenzzzz in #1942
  • Add junit xml flags back to reorganized test script by @dierksen in #1940
  • fix get max_q_len in page prefill plan by @ZhuJiaqi9905 in #1930
  • ci: Create Github Action to Automate CODEOWNER update by @yzh119 in #1870
  • chore: Update CODEOWNERS by @github-actions[bot] in #1871
  • chore: use flashinfer-bot account to create auto pull requests in release-ci-docker workflow by @yzh119 in #1944
  • Update Docker CI tags to 20251018-dbdf533 by @flashinfer-bot in #1945
  • Fix #1641: Use /usr/local/cuda as default CUDA_HOME if possible, like torch.utils.cpp_extension.CUDA_HOME by @netanel-haber in #1948
  • Fix bias dtype by @wenscarl in #1876
  • chore: rename FLASHINFER_JIT_VERBOSE to FLASHINFER_JIT_DEBUG for clarity by @yzh119 in #1946
  • fix: Fix trtllm-gen prefill IMA when batch_size==1 by @bkryu in #1912
  • Feature: Support Relu2 activation in fused MoE by @amirkl94 in #1954
  • fix: Add cutlass as an mm_fp4 backend in compute capability 12.0 in benchmark code by @bkryu in #1959
  • Update the routing for TRTLLMGEN to support kimi k2 and qwen by @ChristinaZ in #1831
  • unittest: fix deepgemm sha256 by @yzh119 in #1953
  • misc: Update artifacts docstring and MetaInfoHash by @jimmyzho in #1967
  • silu_and_mul nvfp4 quanization fusion rework by @wenscarl in #1927
  • unittest: fix test_artifacts.py by @yzh119 in #1950
  • chore: update the list of authorized codeowners by @yzh119 in #1970
  • Added heuristic for trtllm_allreduce_fusion by @nvjullin in #1972
  • Bump tvm ffi to stable version 0.1.0 by @cyx-6 in #1960
  • Update Docker CI tags to 20251024-0e48aaf by @flashinfer-bot in #1975
  • fix: Make attention microbenchmark correctly use page table by @bkryu in #1976
  • fix: Skipping attention sink Blackwell test outside of Blackwell by @bkryu in #1978
  • feat: enable deepgemm jit for fp8 block-scale on SM90 by @djmmoss in #1969
  • chore: Update CODEOWNERS by @flashinfer-bot in #1949
  • fix: correct PDL parameter handling in RopeQuantize kernel by @cicirori in #1982
  • Fix: Verify scales are not None for Cutlass FP8 FusedMoE by @amirkl94 in #1961
  • feat: add xqa fp8 mha and fp8 kv cache by @qsang-nv in #1769
  • unittest: fix failed unittest on hopper by @yzh119 in #1952
  • docs: Update documented versioning scheme to right-shifted semver by @sricketts in #1990
  • Bugfix: Change get() -> GetDLTensorPtr() in cutlass FusedMoE validations by @amirkl94 in #1995
  • unittest: Add SM arch checks to skip unsupported tests on Hopper by @bkryu in #1998
  • Added workspace check and reflected this in test by @nvmbreughe in #1991
  • minor fix for xqa by @qsang-nv in #1994
  • Feature: Add support for L40 FusedMoE in cutlass path by @amirkl94 in #1973
  • unittest: Add head dim 256 test cases and mark as xfail by @bkryu in #1999
  • feat: autotune tile_tokens_dim in trtllm-gen MOE by @jiahanc in #1980
  • Fix trtllm-gen attention illegal memory access by @Tom-Zheng in #2002
  • release: Bump version for v0.5.0rc1 release; by @bkryu in #2008
  • bugfix: fix regex in update wheel index script by @yzh119 in #2009
  • fix: Enable SM121 for mm_fp4 by @bkryu in #2012
  • fix: ensure SM120/121 SFA/SFB contiguity by @yongwww in #1963
  • More realistic bench for POD Attn by @Edenzzzz in #2013
  • Feature: Support non-gated activation in cutlass fused MoE nvfp4 by @omera-nv in #2011
  • feat: add xqa backend and completes NHD/HND coverage for trtllm-gen/xqa backend by @qsang-nv in #2001

New Contributors

Full Changelog: v0.4.1...v0.5.0

Nightly Release v0.5.0-20251103

03 Nov 04:00
da01b1b

Choose a tag to compare

Pre-release

Automated nightly build for version 0.5.0 (dev20251103)

Nightly Release v0.5.0-20251102

02 Nov 04:00
5854494

Choose a tag to compare

Pre-release

Automated nightly build for version 0.5.0 (dev20251102)

Release v0.5.0rc3

01 Nov 00:34

Choose a tag to compare

Release v0.5.0rc3 Pre-release
Pre-release

What's Changed

Full Changelog: v0.5.0rc2...v0.5.0rc3

Release v0.5.0rc2

31 Oct 02:54

Choose a tag to compare

Release v0.5.0rc2 Pre-release
Pre-release

What's Changed

  • bugfix: fix regex in update wheel index script by @yzh119 in #2009
  • fix: Enable SM121 for mm_fp4 by @bkryu in #2012

Full Changelog: v0.5.0rc1...v0.5.0rc2

Nightly Release v0.5.0-20251101

01 Nov 03:57
f9cd034

Choose a tag to compare

Pre-release

Automated nightly build for version 0.5.0 (dev20251101)