Skip to content

Conversation

@amrmahdi
Copy link
Contributor

@amrmahdi amrmahdi commented Dec 13, 2025

Approach:

  • Pre-install PyTorch, FlashInfer, and other slow-changing dependencies in vllm-base before installing the vLLM wheel for better layer caching
  • Add parallel extensions-build stage for DeepGEMM and EP kernels
  • Move stable packages (accelerate, bitsandbytes, etc.) earlier in build

This allows incremental builds with Python-only changes to skip the expensive dependency installation layers.

Performance:

Incremental builds with Python-only changes now complete in ~16 minutes (previously 35+ minutes).

Future work:

We considered building these base stages as separate images that could be built independently and baked into CI AMIs for maximum cache reuse. However, this introduces maintenance burden of extra pipelines and update strategies. The inline approach is simpler and can be optimized later by baking the main build image into AMIs daily to maximize layer cache reuse.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added the ci/build label Dec 13, 2025
@amrmahdi amrmahdi force-pushed the amrh/base-images-inline branch from d98658a to d516ce4 Compare December 13, 2025 21:04
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the Dockerfile to improve build efficiency and leverage caching more effectively. The introduction of a parallel extensions-build stage and the restructuring of the vllm-base stage to pre-install slow-changing dependencies are excellent changes that should substantially reduce incremental build times. The overall approach is well-thought-out and correctly implemented. I have one suggestion to further optimize Docker image layering, but the pull request is a great improvement overall.

@amrmahdi amrmahdi force-pushed the amrh/base-images-inline branch 3 times, most recently from 856d887 to 0267ae2 Compare December 13, 2025 21:08
@mergify
Copy link

mergify bot commented Dec 13, 2025

Hi @amrmahdi, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@amrmahdi amrmahdi mentioned this pull request Dec 13, 2025
5 tasks
@amrmahdi amrmahdi force-pushed the amrh/base-images-inline branch from 0267ae2 to a36d065 Compare December 13, 2025 21:59
@mergify
Copy link

mergify bot commented Dec 13, 2025

Documentation preview: https://vllm--30626.org.readthedocs.build/en/30626/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 13, 2025
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 13, 2025
@amrmahdi
Copy link
Contributor Author

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, great work. Just a few quick nits to cleanup

… builds

- Pre-install PyTorch, FlashInfer, and other slow-changing dependencies
  in vllm-base before installing the vLLM wheel for better layer caching
- Add parallel extensions-build stage for DeepGEMM and EP kernels
- Move stable packages (accelerate, bitsandbytes, etc.) earlier in build

This allows incremental builds with Python-only changes to skip
the expensive dependency installation layers.

Performance:
Incremental builds with Python-only changes now complete in ~16 minutes
(previously 35+ minutes).

Future work:
We considered building these base stages as separate images that could
be built independently and baked into CI AMIs for maximum cache reuse.
However, this introduces maintenance burden of extra pipelines and update
strategies. The inline approach is simpler and can be optimized later by
baking the main build image into AMIs daily to maximize layer cache reuse.

Signed-off-by: Amr Mahdi <[email protected]>
@amrmahdi amrmahdi force-pushed the amrh/base-images-inline branch from 28530ae to 8645883 Compare December 15, 2025 20:32
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the nice work!

@vllm-bot vllm-bot merged commit ff21a0f into vllm-project:main Dec 16, 2025
87 of 92 checks passed
@amrmahdi amrmahdi deleted the amrh/base-images-inline branch December 16, 2025 02:54
weiyu0824 pushed a commit to weiyu0824/vllm that referenced this pull request Dec 16, 2025
TheCodeWrangler pushed a commit to TheCodeWrangler/vllm that referenced this pull request Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants