-
Notifications
You must be signed in to change notification settings - Fork 573
Open
Labels
Description
I encountered an issue with flashinfer attempting to create a directory in order to cache JIT output for qwen3-coder-480b-a35b. I would have assumed in a situation with flashinfer-jit-cache and flashinfer-cubin it would not additionally require jitting.
This was found using the vllm Dockerfile which includes both dependencies.
For details please refer to: vllm-project/vllm#28440
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m File "/usr/local/lib/python3.12/dist-packages/flashinfer/fused_moe/core.py", line 818, in cutlass_fused_moe
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m return get_cutlass_fused_moe_module(device_arch).cutlass_fused_moe(
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m File "/usr/local/lib/python3.12/dist-packages/flashinfer/fused_moe/core.py", line 286, in get_cutlass_fused_moe_module
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m module = gen_cutlass_fused_moe_sm90_module(use_fast_build).build_and_load()
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m File "/usr/local/lib/python3.12/dist-packages/flashinfer/jit/fused_moe.py", line 77, in gen_cutlass_fused_moe_sm90_module
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m return gen_cutlass_fused_moe_module(nvcc_flags, "90", use_fast_build)
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m File "/usr/local/lib/python3.12/dist-packages/flashinfer/jit/fused_moe.py", line 111, in gen_cutlass_fused_moe_module
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m raise RuntimeError(f"Failed to generate Cutlass kernels: {e}") from e
2025-11-11T02:42:39.924015483Z
�[1;36m(EngineCore_DP0 pid=523)�[0;0m RuntimeError: Failed to generate Cutlass kernels: [Errno 13] Permission denied: '/usr/local/lib/python3.12/dist-packages/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_instantiations'
2025-11-11T02:42:39.924015483Z
This is the location where flashinfer attempts to create a directory:
flashinfer/flashinfer/jit/fused_moe.py
Line 103 in 6765cad
| output_dir.mkdir(parents=True, exist_ok=True) |