Skip to content

Commit 8bd5844

Browse files
authored
correct LWS deployment yaml (#23104)
Signed-off-by: cberge908 <[email protected]>
1 parent ce30dca commit 8bd5844

File tree

2 files changed

+3
-5
lines changed

2 files changed

+3
-5
lines changed

docs/deployment/frameworks/lws.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Deploy the following yaml file `lws.yaml`
2222
metadata:
2323
name: vllm
2424
spec:
25-
replicas: 2
25+
replicas: 1
2626
leaderWorkerTemplate:
2727
size: 2
2828
restartPolicy: RecreateGroupOnPodRestart
@@ -41,7 +41,7 @@ Deploy the following yaml file `lws.yaml`
4141
- sh
4242
- -c
4343
- "bash /vllm-workspace/examples/online_serving/multi-node-serving.sh leader --ray_cluster_size=$(LWS_GROUP_SIZE);
44-
python3 -m vllm.entrypoints.openai.api_server --port 8080 --model meta-llama/Meta-Llama-3.1-405B-Instruct --tensor-parallel-size 8 --pipeline_parallel_size 2"
44+
vllm serve meta-llama/Meta-Llama-3.1-405B-Instruct --port 8080 --tensor-parallel-size 8 --pipeline_parallel_size 2"
4545
resources:
4646
limits:
4747
nvidia.com/gpu: "8"
@@ -126,8 +126,6 @@ Should get an output similar to this:
126126
NAME READY STATUS RESTARTS AGE
127127
vllm-0 1/1 Running 0 2s
128128
vllm-0-1 1/1 Running 0 2s
129-
vllm-1 1/1 Running 0 2s
130-
vllm-1-1 1/1 Running 0 2s
131129
```
132130

133131
Verify that the distributed tensor-parallel inference works:

examples/online_serving/multi-node-serving.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
# Example usage:
1212
# On the head node machine, start the Ray head node process and run a vLLM server.
1313
# ./multi-node-serving.sh leader --ray_port=6379 --ray_cluster_size=<SIZE> [<extra ray args>] && \
14-
# python3 -m vllm.entrypoints.openai.api_server --port 8080 --model meta-llama/Meta-Llama-3.1-405B-Instruct --tensor-parallel-size 8 --pipeline_parallel_size 2
14+
# vllm serve meta-llama/Meta-Llama-3.1-405B-Instruct --port 8080 --tensor-parallel-size 8 --pipeline_parallel_size 2
1515
#
1616
# On each worker node, start the Ray worker node process.
1717
# ./multi-node-serving.sh worker --ray_address=<HEAD_NODE_IP> --ray_port=6379 [<extra ray args>]

0 commit comments

Comments
 (0)