Releases: flashinfer-ai/flashinfer
Releases · flashinfer-ai/flashinfer
Nightly Release v0.5.1-20251106
Automated nightly build for version 0.5.1 (dev20251106)
Nightly Release v0.5.1-20251105
Automated nightly build for version 0.5.1 (dev20251105)
Release v0.5.1
What's Changed
- test: Enable xfailed trtllm decode long seqlen tests and update microbenchmark by @bkryu in #2018
- Updated decorator to support unspecified default by @nvmbreughe in #2026
- release: Bump version for v0.5.1 release by @bkryu in #2031
Full Changelog: v0.5.0...v0.5.1
Nightly Release v0.5.0-20251104
Automated nightly build for version 0.5.0 (dev20251104)
Release v0.5.0
What's Changed
- fix vllm graph register and add test by @NVShreyas in #1894
- Support checks PoC by @nvmbreughe in #1809
- chore: Restore FLASHINFER_LOCAL_VERSION environment variable by @yzh119 in #1934
- chore: fix wheel license packaging issues by @yzh119 in #1932
- Add layernorm op for inputs of mixed dtype by @akhilg-nv in #1926
- fixbug: fix devcontainer context by @cyx-6 in #1938
- MLA RoPE + quantization fused kernel: shape generalization for MHA / GQA by @kahyunnam in #1924
- ci: limit max_jobs for arm64 jit wheel cache build on CI by @yzh119 in #1943
- Add realistic bench for persistent kernel by @Edenzzzz in #1942
- Add junit xml flags back to reorganized test script by @dierksen in #1940
- fix get max_q_len in page prefill plan by @ZhuJiaqi9905 in #1930
- ci: Create Github Action to Automate CODEOWNER update by @yzh119 in #1870
- chore: Update CODEOWNERS by @github-actions[bot] in #1871
- chore: use flashinfer-bot account to create auto pull requests in
release-ci-dockerworkflow by @yzh119 in #1944 - Update Docker CI tags to 20251018-dbdf533 by @flashinfer-bot in #1945
- Fix #1641: Use
/usr/local/cudaas defaultCUDA_HOMEif possible, liketorch.utils.cpp_extension.CUDA_HOMEby @netanel-haber in #1948 - Fix bias dtype by @wenscarl in #1876
- chore: rename FLASHINFER_JIT_VERBOSE to FLASHINFER_JIT_DEBUG for clarity by @yzh119 in #1946
- fix: Fix trtllm-gen prefill IMA when batch_size==1 by @bkryu in #1912
- Feature: Support Relu2 activation in fused MoE by @amirkl94 in #1954
- fix: Add cutlass as an mm_fp4 backend in compute capability 12.0 in benchmark code by @bkryu in #1959
- Update the routing for TRTLLMGEN to support kimi k2 and qwen by @ChristinaZ in #1831
- unittest: fix deepgemm sha256 by @yzh119 in #1953
- misc: Update artifacts docstring and MetaInfoHash by @jimmyzho in #1967
- silu_and_mul nvfp4 quanization fusion rework by @wenscarl in #1927
- unittest: fix test_artifacts.py by @yzh119 in #1950
- chore: update the list of authorized codeowners by @yzh119 in #1970
- Added heuristic for trtllm_allreduce_fusion by @nvjullin in #1972
- Bump tvm ffi to stable version 0.1.0 by @cyx-6 in #1960
- Update Docker CI tags to 20251024-0e48aaf by @flashinfer-bot in #1975
- fix: Make attention microbenchmark correctly use page table by @bkryu in #1976
- fix: Skipping attention sink Blackwell test outside of Blackwell by @bkryu in #1978
- feat: enable deepgemm jit for fp8 block-scale on SM90 by @djmmoss in #1969
- chore: Update CODEOWNERS by @flashinfer-bot in #1949
- fix: correct PDL parameter handling in RopeQuantize kernel by @cicirori in #1982
- Fix: Verify scales are not None for Cutlass FP8 FusedMoE by @amirkl94 in #1961
- feat: add xqa fp8 mha and fp8 kv cache by @qsang-nv in #1769
- unittest: fix failed unittest on hopper by @yzh119 in #1952
- docs: Update documented versioning scheme to right-shifted semver by @sricketts in #1990
- Bugfix: Change get() -> GetDLTensorPtr() in cutlass FusedMoE validations by @amirkl94 in #1995
- unittest: Add SM arch checks to skip unsupported tests on Hopper by @bkryu in #1998
- Added workspace check and reflected this in test by @nvmbreughe in #1991
- minor fix for xqa by @qsang-nv in #1994
- Feature: Add support for L40 FusedMoE in cutlass path by @amirkl94 in #1973
- unittest: Add head dim 256 test cases and mark as xfail by @bkryu in #1999
- feat: autotune tile_tokens_dim in trtllm-gen MOE by @jiahanc in #1980
- Fix trtllm-gen attention illegal memory access by @Tom-Zheng in #2002
- release: Bump version for v0.5.0rc1 release; by @bkryu in #2008
- bugfix: fix regex in update wheel index script by @yzh119 in #2009
- fix: Enable SM121 for mm_fp4 by @bkryu in #2012
- fix: ensure SM120/121 SFA/SFB contiguity by @yongwww in #1963
- More realistic bench for POD Attn by @Edenzzzz in #2013
- Feature: Support non-gated activation in cutlass fused MoE nvfp4 by @omera-nv in #2011
- feat: add xqa backend and completes NHD/HND coverage for trtllm-gen/xqa backend by @qsang-nv in #2001
New Contributors
- @NVShreyas made their first contribution in #1894
- @akhilg-nv made their first contribution in #1926
- @ZhuJiaqi9905 made their first contribution in #1930
- @flashinfer-bot made their first contribution in #1945
- @netanel-haber made their first contribution in #1948
- @ChristinaZ made their first contribution in #1831
- @cicirori made their first contribution in #1982
- @Tom-Zheng made their first contribution in #2002
- @omera-nv made their first contribution in #2011
Full Changelog: v0.4.1...v0.5.0
Nightly Release v0.5.0-20251103
Automated nightly build for version 0.5.0 (dev20251103)
Nightly Release v0.5.0-20251102
Automated nightly build for version 0.5.0 (dev20251102)
Release v0.5.0rc3
What's Changed
- fix: ensure SM120/121 SFA/SFB contiguity by @yongwww in #1963
- More realistic bench for POD Attn by @Edenzzzz in #2013
Full Changelog: v0.5.0rc2...v0.5.0rc3
Release v0.5.0rc2
What's Changed
- bugfix: fix regex in update wheel index script by @yzh119 in #2009
- fix: Enable SM121 for mm_fp4 by @bkryu in #2012
Full Changelog: v0.5.0rc1...v0.5.0rc2
Nightly Release v0.5.0-20251101
Automated nightly build for version 0.5.0 (dev20251101)