Skip to content

Commit acab8b5

Browse files
authored
Merge pull request #324 from docker/feat/vllm-prebuilt-wheels
Use vllm official wheels
2 parents cb40663 + f9cb21c commit acab8b5

File tree

3 files changed

+76
-4
lines changed

3 files changed

+76
-4
lines changed

.github/workflows/release.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,12 +136,14 @@ jobs:
136136
with:
137137
file: Dockerfile
138138
target: final-vllm
139-
platforms: linux/amd64
139+
platforms: linux/amd64, linux/arm64
140140
build-args: |
141141
"LLAMA_SERVER_VERSION=${{ inputs.llamaServerVersion }}"
142142
"LLAMA_SERVER_VARIANT=cuda"
143143
"BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04"
144144
"VLLM_VERSION=${{ inputs.vllmVersion }}"
145+
"VLLM_CUDA_VERSION=cu129"
146+
"VLLM_PYTHON_TAG=cp38-abi3"
145147
push: true
146148
sbom: true
147149
provenance: mode=max

Dockerfile

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,10 @@ ENTRYPOINT ["/app/model-runner"]
7979
# --- vLLM variant ---
8080
FROM llamacpp AS vllm
8181

82-
ARG VLLM_VERSION
82+
ARG VLLM_VERSION=0.11.0
83+
ARG VLLM_CUDA_VERSION=cu129
84+
ARG VLLM_PYTHON_TAG=cp38-abi3
85+
ARG TARGETARCH
8386

8487
USER root
8588

@@ -89,10 +92,16 @@ RUN mkdir -p /opt/vllm-env && chown -R modelrunner:modelrunner /opt/vllm-env
8992

9093
USER modelrunner
9194

92-
# Install uv and vLLM as modelrunner user
95+
# Install uv and vLLM wheel as modelrunner user
9396
RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
9497
&& ~/.local/bin/uv venv --python /usr/bin/python3 /opt/vllm-env \
95-
&& ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "vllm==${VLLM_VERSION}"
98+
&& if [ "$TARGETARCH" = "amd64" ]; then \
99+
WHEEL_ARCH="manylinux1_x86_64"; \
100+
else \
101+
WHEEL_ARCH="manylinux2014_aarch64"; \
102+
fi \
103+
&& WHEEL_URL="https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}%2B${VLLM_CUDA_VERSION}-${VLLM_PYTHON_TAG}-${WHEEL_ARCH}.whl" \
104+
&& ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "$WHEEL_URL"
96105

97106
RUN /opt/vllm-env/bin/python -c "import vllm; print(vllm.__version__)" > /opt/vllm-env/version
98107

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,67 @@ Available variants:
228228

229229
The binary path in the image follows this pattern: `/com.docker.llama-server.native.linux.${LLAMA_SERVER_VARIANT}.${TARGETARCH}`
230230

231+
### vLLM integration
232+
233+
The Docker image also supports vLLM as an alternative inference backend.
234+
235+
#### Building the vLLM variant
236+
237+
To build a Docker image with vLLM support:
238+
239+
```sh
240+
# Build with default settings (vLLM 0.11.0)
241+
make docker-build DOCKER_TARGET=final-vllm BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04 LLAMA_SERVER_VARIANT=cuda
242+
243+
# Build for specific architecture
244+
docker buildx build \
245+
--platform linux/amd64 \
246+
--target final-vllm \
247+
--build-arg BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04 \
248+
--build-arg LLAMA_SERVER_VARIANT=cuda \
249+
--build-arg VLLM_VERSION=0.11.0 \
250+
-t docker/model-runner:vllm .
251+
```
252+
253+
#### Build Arguments
254+
255+
The vLLM variant supports the following build arguments:
256+
257+
- **VLLM_VERSION**: The vLLM version to install (default: `0.11.0`)
258+
- **VLLM_CUDA_VERSION**: The CUDA version suffix for the wheel (default: `cu129`)
259+
- **VLLM_PYTHON_TAG**: The Python compatibility tag (default: `cp38-abi3`, compatible with Python 3.8+)
260+
261+
#### Multi-Architecture Support
262+
263+
The vLLM variant supports both x86_64 (amd64) and aarch64 (arm64) architectures. The build process automatically selects the appropriate prebuilt wheel:
264+
265+
- **linux/amd64**: Uses `manylinux1_x86_64` wheels
266+
- **linux/arm64**: Uses `manylinux2014_aarch64` wheels
267+
268+
To build for multiple architectures:
269+
270+
```sh
271+
docker buildx build \
272+
--platform linux/amd64,linux/arm64 \
273+
--target final-vllm \
274+
--build-arg BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04 \
275+
--build-arg LLAMA_SERVER_VARIANT=cuda \
276+
-t docker/model-runner:vllm .
277+
```
278+
279+
#### Updating to a New vLLM Version
280+
281+
To update to a new vLLM version:
282+
283+
```sh
284+
docker buildx build \
285+
--target final-vllm \
286+
--build-arg VLLM_VERSION=0.11.1 \
287+
-t docker/model-runner:vllm-0.11.1 .
288+
```
289+
290+
The vLLM wheels are sourced from the official vLLM GitHub Releases at `https://github.com/vllm-project/vllm/releases`, which provides prebuilt wheels for each release version.
291+
231292
## API Examples
232293

233294
The Model Runner exposes a REST API that can be accessed via TCP port. You can interact with it using curl commands.

0 commit comments

Comments
 (0)