-
-
Notifications
You must be signed in to change notification settings - Fork 12k
[docker] Restructure Dockerfile for more efficient and cache-friendly builds #30626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
d98658a to
d516ce4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly refactors the Dockerfile to improve build efficiency and leverage caching more effectively. The introduction of a parallel extensions-build stage and the restructuring of the vllm-base stage to pre-install slow-changing dependencies are excellent changes that should substantially reduce incremental build times. The overall approach is well-thought-out and correctly implemented. I have one suggestion to further optimize Docker image layering, but the pull request is a great improvement overall.
856d887 to
0267ae2
Compare
|
Hi @amrmahdi, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
0267ae2 to
a36d065
Compare
|
Documentation preview: https://vllm--30626.org.readthedocs.build/en/30626/ |
|
As a datapoint, my rebase build took 16m20s https://buildkite.com/vllm/ci/builds/43561/steps/canvas?sid=019b207f-7cc2-4327-8035-07fa0e925428 |
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, great work. Just a few quick nits to cleanup
… builds - Pre-install PyTorch, FlashInfer, and other slow-changing dependencies in vllm-base before installing the vLLM wheel for better layer caching - Add parallel extensions-build stage for DeepGEMM and EP kernels - Move stable packages (accelerate, bitsandbytes, etc.) earlier in build This allows incremental builds with Python-only changes to skip the expensive dependency installation layers. Performance: Incremental builds with Python-only changes now complete in ~16 minutes (previously 35+ minutes). Future work: We considered building these base stages as separate images that could be built independently and baked into CI AMIs for maximum cache reuse. However, this introduces maintenance burden of extra pipelines and update strategies. The inline approach is simpler and can be optimized later by baking the main build image into AMIs daily to maximize layer cache reuse. Signed-off-by: Amr Mahdi <[email protected]>
28530ae to
8645883
Compare
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the nice work!
… builds (vllm-project#30626) Signed-off-by: Amr Mahdi <[email protected]>
… builds (vllm-project#30626) Signed-off-by: Amr Mahdi <[email protected]> Signed-off-by: Nathan Price <[email protected]>
Approach:
This allows incremental builds with Python-only changes to skip the expensive dependency installation layers.
Performance:
Incremental builds with Python-only changes now complete in ~16 minutes (previously 35+ minutes).
Future work:
We considered building these base stages as separate images that could be built independently and baked into CI AMIs for maximum cache reuse. However, this introduces maintenance burden of extra pipelines and update strategies. The inline approach is simpler and can be optimized later by baking the main build image into AMIs daily to maximize layer cache reuse.