File tree Expand file tree Collapse file tree 2 files changed +3
-5
lines changed
docs/deployment/frameworks Expand file tree Collapse file tree 2 files changed +3
-5
lines changed Original file line number Diff line number Diff line change @@ -22,7 +22,7 @@ Deploy the following yaml file `lws.yaml`
2222 metadata:
2323 name: vllm
2424 spec:
25- replicas: 2
25+ replicas: 1
2626 leaderWorkerTemplate:
2727 size: 2
2828 restartPolicy: RecreateGroupOnPodRestart
@@ -41,7 +41,7 @@ Deploy the following yaml file `lws.yaml`
4141 - sh
4242 - -c
4343 - "bash /vllm-workspace/examples/online_serving/multi-node-serving.sh leader --ray_cluster_size=$(LWS_GROUP_SIZE);
44- python3 -m vllm.entrypoints.openai.api_server --port 8080 --model meta-llama/Meta-Llama-3.1-405B-Instruct --tensor-parallel-size 8 --pipeline_parallel_size 2"
44+ vllm serve meta-llama/Meta-Llama-3.1-405B-Instruct --port 8080 --tensor-parallel-size 8 --pipeline_parallel_size 2"
4545 resources:
4646 limits:
4747 nvidia.com/gpu: "8"
@@ -126,8 +126,6 @@ Should get an output similar to this:
126126NAME READY STATUS RESTARTS AGE
127127vllm-0 1/1 Running 0 2s
128128vllm-0-1 1/1 Running 0 2s
129- vllm-1 1/1 Running 0 2s
130- vllm-1-1 1/1 Running 0 2s
131129```
132130
133131Verify that the distributed tensor-parallel inference works:
Original file line number Diff line number Diff line change 1111# Example usage:
1212# On the head node machine, start the Ray head node process and run a vLLM server.
1313# ./multi-node-serving.sh leader --ray_port=6379 --ray_cluster_size=<SIZE> [<extra ray args>] && \
14- # python3 -m vllm.entrypoints.openai.api_server --port 8080 --model meta-llama/Meta-Llama-3.1-405B-Instruct --tensor-parallel-size 8 --pipeline_parallel_size 2
14+ # vllm serve meta-llama/Meta-Llama-3.1-405B-Instruct --port 8080 --tensor-parallel-size 8 --pipeline_parallel_size 2
1515#
1616# On each worker node, start the Ray worker node process.
1717# ./multi-node-serving.sh worker --ray_address=<HEAD_NODE_IP> --ray_port=6379 [<extra ray args>]
You can’t perform that action at this time.
0 commit comments