Skip to content

Commit 8645883

Browse files
committed
[docker] Restructure Dockerfile for more efficient and cache-friendly builds
- Pre-install PyTorch, FlashInfer, and other slow-changing dependencies in vllm-base before installing the vLLM wheel for better layer caching - Add parallel extensions-build stage for DeepGEMM and EP kernels - Move stable packages (accelerate, bitsandbytes, etc.) earlier in build This allows incremental builds with Python-only changes to skip the expensive dependency installation layers. Performance: Incremental builds with Python-only changes now complete in ~16 minutes (previously 35+ minutes). Future work: We considered building these base stages as separate images that could be built independently and baked into CI AMIs for maximum cache reuse. However, this introduces maintenance burden of extra pipelines and update strategies. The inline approach is simpler and can be optimized later by baking the main build image into AMIs daily to maximize layer cache reuse. Signed-off-by: Amr Mahdi <[email protected]>
1 parent 7c16f3f commit 8645883

File tree

2 files changed

+157
-115
lines changed

2 files changed

+157
-115
lines changed

0 commit comments

Comments
 (0)