Releases: flashinfer-ai/flashinfer
Releases · flashinfer-ai/flashinfer
Nightly Release v0.4.1-20251021
Automated nightly build for version 0.4.1 (dev20251021)
Nightly Release v0.4.1-20251020
Automated nightly build for version 0.4.1 (dev20251020)
Nightly Release v0.4.1-20251019
Automated nightly build for version 0.4.1 (dev20251019)
Nightly Release v0.4.1-20251018
Automated nightly build for version 0.4.1 (dev20251018)
Nightly Release v0.4.1-20251017
Automated nightly build for version 0.4.1 (dev20251017)
Nightly Release v0.4.1-20251016
Automated nightly build for version 0.4.1 (dev20251016)
Nightly Release v0.4.1-20251015
Automated nightly build for version 0.4.1 (dev20251015)
Release v0.4.1
What's Changed
- fix: fix the failed sampling unittest on 5090 by @yzh119 in #1886
- Updated to latest docker tag by @nvmbreughe in #1889
- Fix: Prevent race condition in cubin loader when file is being consumed by @yzh119 in #1852
- Improve graph caching of cudnn graph by @Anerudhan in #1887
- misc: Various Updates to Attention Microbenchmark Suite by @bkryu in #1891
- docs: Fix installation instructions for CUDA-specific package URLs by @yzh119 in #1893
- docker image improvements by @nvmbreughe in #1890
- tests: Add batch size 1 cases to test_trtllm_gen_attention.py that fail, marked xfail by @bkryu in #1897
- Ensure docker installs the torch version we need by @nvmbreughe in #1901
- bugfix: exclude
tests/utils/test_load_cubin_compile_race_condition.pyfrom pytest by @yzh119 in #1907 - ci: use self-hosted runner for building docker containers by @yzh119 in #1908
- feat: Add FP4 TRTLLM-Gen throughput MOE batched gemms by @jiahanc in #1882
- Update Docker CI tags to 20251010-8d072e6 by @github-actions[bot] in #1915
- ci/cd: consolidate release workflow by @yzh119 in #1910
- bugfix: fix cli error when cuda toolkit is not installed by @yzh119 in #1905
- feat: trtrllm-gen global scaled FP8 GEMMs by @hypdeb in #1829
- feat:enable fp8 blockscale moe for fused cultass for sm90 by @djmmoss in #1819
- use
ffi::TensorViewinstead offfi::Tensorby @cyx-6 in #1844 - Minor updates to cubin_loader.py download_file to avoid race condition on temporary file by @nvjullin in #1918
- chore: make cache directory flashinfer-version specific by @yzh119 in #1920
- misc: checksum check when downloading artifacts by @jimmyzho in #1761
- release: bump version v0.4.1 by @yzh119 in #1921
New Contributors
Full Changelog: v0.4.0...v0.4.1
Nightly Release v0.4.0-20251014
Automated nightly build for version 0.4.0 (dev20251014)
Nightly Release v0.4.0-20251013
Automated nightly build for version 0.4.0 (dev20251013)