Commit 8645883

committed

[docker] Restructure Dockerfile for more efficient and cache-friendly builds

- Pre-install PyTorch, FlashInfer, and other slow-changing dependencies in vllm-base before installing the vLLM wheel for better layer caching - Add parallel extensions-build stage for DeepGEMM and EP kernels - Move stable packages (accelerate, bitsandbytes, etc.) earlier in build This allows incremental builds with Python-only changes to skip the expensive dependency installation layers. Performance: Incremental builds with Python-only changes now complete in ~16 minutes (previously 35+ minutes). Future work: We considered building these base stages as separate images that could be built independently and baked into CI AMIs for maximum cache reuse. However, this introduces maintenance burden of extra pipelines and update strategies. The inline approach is simpler and can be optimized later by baking the main build image into AMIs daily to maximize layer cache reuse. Signed-off-by: Amr Mahdi <[email protected]>

1 parent 7c16f3f commit 8645883Copy full SHA for 8645883

2 files changed

+157

-115

lines changed

docker
- Dockerfile
docs/assets/contributing
- dockerfile-stages-dependency.png

2 files changed

+157

-115

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit 8645883

2 files changed

2 files changed

Uh oh!

File tree

2 files changed

2 files changed

0 commit comments