Skip to content

[BUG] Flashinfer with jit-cache and cubin still attempts to jit #2093

@bbartels

Description

@bbartels

I encountered an issue with flashinfer attempting to create a directory in order to cache JIT output for qwen3-coder-480b-a35b. I would have assumed in a situation with flashinfer-jit-cache and flashinfer-cubin it would not additionally require jitting.
This was found using the vllm Dockerfile which includes both dependencies.
For details please refer to: vllm-project/vllm#28440

2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m   File "/usr/local/lib/python3.12/dist-packages/flashinfer/fused_moe/core.py", line 818, in cutlass_fused_moe
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m     return get_cutlass_fused_moe_module(device_arch).cutlass_fused_moe(
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m   File "/usr/local/lib/python3.12/dist-packages/flashinfer/fused_moe/core.py", line 286, in get_cutlass_fused_moe_module
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m     module = gen_cutlass_fused_moe_sm90_module(use_fast_build).build_and_load()
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m   File "/usr/local/lib/python3.12/dist-packages/flashinfer/jit/fused_moe.py", line 77, in gen_cutlass_fused_moe_sm90_module
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m     return gen_cutlass_fused_moe_module(nvcc_flags, "90", use_fast_build)
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m   File "/usr/local/lib/python3.12/dist-packages/flashinfer/jit/fused_moe.py", line 111, in gen_cutlass_fused_moe_module
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m     raise RuntimeError(f"Failed to generate Cutlass kernels: {e}") from e
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m RuntimeError: Failed to generate Cutlass kernels: [Errno 13] Permission denied: '/usr/local/lib/python3.12/dist-packages/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_instantiations'
2025-11-11T02:42:39.924015483Z

This is the location where flashinfer attempts to create a directory:

output_dir.mkdir(parents=True, exist_ok=True)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions