Skip to content

Commit 0c8aacc

Browse files
committed
use vllm official wheels
1 parent d03aac3 commit 0c8aacc

File tree

3 files changed

+97
-4
lines changed

3 files changed

+97
-4
lines changed

.github/workflows/release.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@ on:
2424
required: false
2525
type: string
2626
default: "0.11.0"
27+
vllmCommitSha:
28+
description: 'vLLM commit SHA (from git rev-list -n 1 v{version})'
29+
required: false
30+
type: string
31+
default: "b8b302cde434df8c9289a2b465406b47ebab1c2d"
2732

2833
jobs:
2934
test:
@@ -124,12 +129,15 @@ jobs:
124129
with:
125130
file: Dockerfile
126131
target: final-vllm
127-
platforms: linux/amd64
132+
platforms: linux/amd64, linux/arm64
128133
build-args: |
129134
"LLAMA_SERVER_VERSION=${{ inputs.llamaServerVersion }}"
130135
"LLAMA_SERVER_VARIANT=cuda"
131136
"BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04"
132137
"VLLM_VERSION=${{ inputs.vllmVersion }}"
138+
"VLLM_COMMIT_SHA=${{ inputs.vllmCommitSha }}"
139+
"VLLM_CUDA_VERSION=cu129"
140+
"VLLM_PYTHON_TAG=cp38-abi3"
133141
push: true
134142
sbom: true
135143
provenance: mode=max

Dockerfile

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,11 @@ ENTRYPOINT ["/app/model-runner"]
7979
# --- vLLM variant ---
8080
FROM llamacpp AS vllm
8181

82-
ARG VLLM_VERSION
82+
ARG VLLM_VERSION=0.11.0
83+
ARG VLLM_COMMIT_SHA=b8b302cde434df8c9289a2b465406b47ebab1c2d
84+
ARG VLLM_CUDA_VERSION=cu129
85+
ARG VLLM_PYTHON_TAG=cp38-abi3
86+
ARG TARGETARCH
8387

8488
USER root
8589

@@ -89,10 +93,16 @@ RUN mkdir -p /opt/vllm-env && chown -R modelrunner:modelrunner /opt/vllm-env
8993

9094
USER modelrunner
9195

92-
# Install uv and vLLM as modelrunner user
96+
# Install uv and vLLM wheel as modelrunner user
9397
RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
9498
&& ~/.local/bin/uv venv --python /usr/bin/python3 /opt/vllm-env \
95-
&& ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "vllm==${VLLM_VERSION}"
99+
&& if [ "$TARGETARCH" = "amd64" ]; then \
100+
WHEEL_ARCH="manylinux1_x86_64"; \
101+
else \
102+
WHEEL_ARCH="manylinux2014_aarch64"; \
103+
fi \
104+
&& WHEEL_URL="https://wheels.vllm.ai/${VLLM_COMMIT_SHA}/vllm-${VLLM_VERSION}%2B${VLLM_CUDA_VERSION}-${VLLM_PYTHON_TAG}-${WHEEL_ARCH}.whl" \
105+
&& ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "$WHEEL_URL"
96106

97107
RUN /opt/vllm-env/bin/python -c "import vllm; print(vllm.__version__)" > /opt/vllm-env/version
98108

README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,81 @@ Available variants:
228228

229229
The binary path in the image follows this pattern: `/com.docker.llama-server.native.linux.${LLAMA_SERVER_VARIANT}.${TARGETARCH}`
230230

231+
### vLLM integration
232+
233+
The Docker image also supports vLLM as an alternative inference backend.
234+
235+
#### Building the vLLM variant
236+
237+
To build a Docker image with vLLM support:
238+
239+
```sh
240+
# Build with default settings (vLLM 0.11.0)
241+
make docker-build DOCKER_TARGET=final-vllm BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04 LLAMA_SERVER_VARIANT=cuda
242+
243+
# Build for specific architecture
244+
docker buildx build \
245+
--platform linux/amd64 \
246+
--target final-vllm \
247+
--build-arg BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04 \
248+
--build-arg LLAMA_SERVER_VARIANT=cuda \
249+
--build-arg VLLM_VERSION=0.11.0 \
250+
--build-arg VLLM_COMMIT_SHA=b8b302cde434df8c9289a2b465406b47ebab1c2d \
251+
-t docker/model-runner:vllm .
252+
```
253+
254+
#### Build Arguments
255+
256+
The vLLM variant supports the following build arguments:
257+
258+
- **VLLM_VERSION**: The vLLM version to install (default: `0.11.0`)
259+
- **VLLM_COMMIT_SHA**: The git commit SHA corresponding to the vLLM version (default: `b8b302cde434df8c9289a2b465406b47ebab1c2d` for v0.11.0)
260+
- **VLLM_CUDA_VERSION**: The CUDA version suffix for the wheel (default: `cu129`)
261+
- **VLLM_PYTHON_TAG**: The Python compatibility tag (default: `cp38-abi3`, compatible with Python 3.8+)
262+
263+
#### Multi-Architecture Support
264+
265+
The vLLM variant supports both x86_64 (amd64) and aarch64 (arm64) architectures. The build process automatically selects the appropriate prebuilt wheel:
266+
267+
- **linux/amd64**: Uses `manylinux1_x86_64` wheels
268+
- **linux/arm64**: Uses `manylinux2014_aarch64` wheels
269+
270+
To build for multiple architectures:
271+
272+
```sh
273+
docker buildx build \
274+
--platform linux/amd64,linux/arm64 \
275+
--target final-vllm \
276+
--build-arg BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04 \
277+
--build-arg LLAMA_SERVER_VARIANT=cuda \
278+
-t docker/model-runner:vllm .
279+
```
280+
281+
#### Updating to a New vLLM Version
282+
283+
To update to a new vLLM version, you need to:
284+
285+
1. **Find the commit SHA for the version:**
286+
```sh
287+
# Clone the vLLM repository (if not already cloned)
288+
git clone https://github.com/vllm-project/vllm.git
289+
cd vllm
290+
291+
# Get the commit SHA for a specific version
292+
git rev-list -n 1 v0.11.1
293+
```
294+
295+
2. **Build with the new version:**
296+
```sh
297+
docker buildx build \
298+
--target final-vllm \
299+
--build-arg VLLM_VERSION=0.11.1 \
300+
--build-arg VLLM_COMMIT_SHA=<commit-sha-from-step-1> \
301+
-t docker/model-runner:vllm-0.11.1 .
302+
```
303+
304+
The vLLM wheels are sourced from the official vLLM wheel repository at `https://wheels.vllm.ai/{commit_sha}/vllm/`, which provides prebuilt wheels for every commit.
305+
231306
## API Examples
232307

233308
The Model Runner exposes a REST API that can be accessed via TCP port. You can interact with it using curl commands.

0 commit comments

Comments
 (0)