Commit 8645883
committed
[docker] Restructure Dockerfile for more efficient and cache-friendly builds
- Pre-install PyTorch, FlashInfer, and other slow-changing dependencies
in vllm-base before installing the vLLM wheel for better layer caching
- Add parallel extensions-build stage for DeepGEMM and EP kernels
- Move stable packages (accelerate, bitsandbytes, etc.) earlier in build
This allows incremental builds with Python-only changes to skip
the expensive dependency installation layers.
Performance:
Incremental builds with Python-only changes now complete in ~16 minutes
(previously 35+ minutes).
Future work:
We considered building these base stages as separate images that could
be built independently and baked into CI AMIs for maximum cache reuse.
However, this introduces maintenance burden of extra pipelines and update
strategies. The inline approach is simpler and can be optimized later by
baking the main build image into AMIs daily to maximize layer cache reuse.
Signed-off-by: Amr Mahdi <[email protected]>1 parent 7c16f3f commit 8645883
File tree
2 files changed
+157
-115
lines changed- docker
- docs/assets/contributing
2 files changed
+157
-115
lines changed
0 commit comments