Skip to content

Commit 4638113

Browse files
committed
add elastic proxy
Signed-off-by: yuxinshan <[email protected]> Signed-off-by: CalvinXKY <[email protected]>
1 parent 84d7f5a commit 4638113

File tree

2 files changed

+516
-0
lines changed

2 files changed

+516
-0
lines changed

examples/elastic_scaling/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
This file provides a elastic proxy demo to support elastic scaling for P/D instances based on KV pool.
2+
3+
We can launch multiple vllm instances (2 for prefill and 2 for decode), and
4+
launch this proxy demo through:
5+
6+
```shell
7+
export ADMIN_API_KEY=YOUR_ADMIN_API_KEY
8+
python3 examples/elastic_scaling/elastic_proxy.py \
9+
--model $model_name \
10+
--prefill localhost:8100 localhost:8101 \
11+
--decode localhost:8200 localhost:8201 \
12+
--port 8000
13+
```
14+
15+
### Support API routes
16+
* `/v1/completions`: get completions request response.
17+
* `/v1/chat/completions`: get chat request response.
18+
* `/status`: get the supported prefill nodes and decode nodes list.
19+
* `/instances/add`: add prefill nodes or decode nodes to the list.
20+
21+
examples:
22+
```shell
23+
# /v1/completions
24+
curl -X POST http://0.0.0.0:8000/v1/completions \
25+
-H "Content-Type: application/json" \
26+
-d '{"model": "'$model_name'", "max_tokens": 50, "prompt": "hello"}'
27+
28+
# /v1/chat/completions
29+
curl -X POST http://0.0.0.0:8000/v1/chat/completions \
30+
-H "Content-Type: application/json" \
31+
-d '{"model": "'$model_name'", "max_tokens": 50,
32+
"messages": [{
33+
"role": "user",
34+
"content": "hello"
35+
}]}'
36+
37+
# /status
38+
curl -X POST http://0.0.0.0:8000/status
39+
40+
# /instance/add
41+
curl -X POST http://0.0.0.0:8000/instances/add \
42+
-H "Content-Type: application/json" \
43+
-H "X-Api-Key: YOUR_ADMIN_API_KEY" \
44+
-d '{"type": "prefill", "instance": "0.0.0.0:8100"}'
45+
```
46+
47+
### Support functions
48+
49+
* Support adding prefill nodes or decode nodes at any time.
50+
- If prefill or decode server has been deployed, proxy can add nodes when the proxy is deployed.
51+
- If prefill or decode server deployed after the proxy deployed, server can use `/instances/add` API to join the proxy server. The prefill server or decode server sends a signal to the proxy server, and the proxy server will check the status of the node util the node is available.
52+
* Support removing nodes when the prefill or decode server failed more than a certain number of times.
53+
* Support elastic scaling. When the current node is unavailable, the proxy server will schedule to the next available node.

0 commit comments

Comments
 (0)