Releases · flashinfer-ai/flashinfer

test: Enable xfailed trtllm decode long seqlen tests and update microbenchmark by @bkryu in #2018
Updated decorator to support unspecified default by @nvmbreughe in #2026
release: Bump version for v0.5.1 release by @bkryu in #2031

Full Changelog: v0.5.0...v0.5.1

Contributors

bkryu and nvmbreughe

Assets 11

04 Nov 03:53

github-actions

nightly-v0.5.0-20251104

2d68a6b

Nightly Release v0.5.0-20251104 Pre-release

Pre-release

Automated nightly build for version 0.5.0 (dev20251104)

Assets 11

02 Nov 05:52

github-actions

v0.5.0

5854494

Release v0.5.0

What's Changed

fix vllm graph register and add test by @NVShreyas in #1894
Support checks PoC by @nvmbreughe in #1809
chore: Restore FLASHINFER_LOCAL_VERSION environment variable by @yzh119 in #1934
chore: fix wheel license packaging issues by @yzh119 in #1932
Add layernorm op for inputs of mixed dtype by @akhilg-nv in #1926
fixbug: fix devcontainer context by @cyx-6 in #1938
MLA RoPE + quantization fused kernel: shape generalization for MHA / GQA by @kahyunnam in #1924
ci: limit max_jobs for arm64 jit wheel cache build on CI by @yzh119 in #1943
Add realistic bench for persistent kernel by @Edenzzzz in #1942
Add junit xml flags back to reorganized test script by @dierksen in #1940
fix get max_q_len in page prefill plan by @ZhuJiaqi9905 in #1930
ci: Create Github Action to Automate CODEOWNER update by @yzh119 in #1870
chore: Update CODEOWNERS by @github-actions[bot] in #1871
chore: use flashinfer-bot account to create auto pull requests in release-ci-docker workflow by @yzh119 in #1944
Update Docker CI tags to 20251018-dbdf533 by @flashinfer-bot in #1945
Fix #1641: Use /usr/local/cuda as default CUDA_HOME if possible, like torch.utils.cpp_extension.CUDA_HOME by @netanel-haber in #1948
Fix bias dtype by @wenscarl in #1876
chore: rename FLASHINFER_JIT_VERBOSE to FLASHINFER_JIT_DEBUG for clarity by @yzh119 in #1946
fix: Fix trtllm-gen prefill IMA when batch_size==1 by @bkryu in #1912
Feature: Support Relu2 activation in fused MoE by @amirkl94 in #1954
fix: Add cutlass as an mm_fp4 backend in compute capability 12.0 in benchmark code by @bkryu in #1959
Update the routing for TRTLLMGEN to support kimi k2 and qwen by @ChristinaZ in #1831
unittest: fix deepgemm sha256 by @yzh119 in #1953
misc: Update artifacts docstring and MetaInfoHash by @jimmyzho in #1967
silu_and_mul nvfp4 quanization fusion rework by @wenscarl in #1927
unittest: fix test_artifacts.py by @yzh119 in #1950
chore: update the list of authorized codeowners by @yzh119 in #1970
Added heuristic for trtllm_allreduce_fusion by @nvjullin in #1972
Bump tvm ffi to stable version 0.1.0 by @cyx-6 in #1960
Update Docker CI tags to 20251024-0e48aaf by @flashinfer-bot in #1975
fix: Make attention microbenchmark correctly use page table by @bkryu in #1976
fix: Skipping attention sink Blackwell test outside of Blackwell by @bkryu in #1978
feat: enable deepgemm jit for fp8 block-scale on SM90 by @djmmoss in #1969
chore: Update CODEOWNERS by @flashinfer-bot in #1949
fix: correct PDL parameter handling in RopeQuantize kernel by @cicirori in #1982
Fix: Verify scales are not None for Cutlass FP8 FusedMoE by @amirkl94 in #1961
feat: add xqa fp8 mha and fp8 kv cache by @qsang-nv in #1769
unittest: fix failed unittest on hopper by @yzh119 in #1952
docs: Update documented versioning scheme to right-shifted semver by @sricketts in #1990
Bugfix: Change get() -> GetDLTensorPtr() in cutlass FusedMoE validations by @amirkl94 in #1995
unittest: Add SM arch checks to skip unsupported tests on Hopper by @bkryu in #1998
Added workspace check and reflected this in test by @nvmbreughe in #1991
minor fix for xqa by @qsang-nv in #1994
Feature: Add support for L40 FusedMoE in cutlass path by @amirkl94 in #1973
unittest: Add head dim 256 test cases and mark as xfail by @bkryu in #1999
feat: autotune tile_tokens_dim in trtllm-gen MOE by @jiahanc in #1980
Fix trtllm-gen attention illegal memory access by @Tom-Zheng in #2002
release: Bump version for v0.5.0rc1 release; by @bkryu in #2008
bugfix: fix regex in update wheel index script by @yzh119 in #2009
fix: Enable SM121 for mm_fp4 by @bkryu in #2012
fix: ensure SM120/121 SFA/SFB contiguity by @yongwww in #1963
More realistic bench for POD Attn by @Edenzzzz in #2013
Feature: Support non-gated activation in cutlass fused MoE nvfp4 by @omera-nv in #2011
feat: add xqa backend and completes NHD/HND coverage for trtllm-gen/xqa backend by @qsang-nv in #2001