[feat] add elastic proxy #4545

yuxinshan · 2025-11-28T09:27:57Z

[RFC]: Elastic Scaling Support for P/D Instances Based on KV Pool: #3380

What this PR does / why we need it?

This file provides a elastic proxy demo to support elastic scaling for P/D instances based on KV pool.

We can launch multiple vllm instances (2 for prefill and 2 for decode), and
launch this proxy demo through:

export ADMIN_API_KEY=YOUR_ADMIN_API_KEY
python3 examples/elastic_scaling/elastic_proxy.py  \
   --model $model_name  \
   --prefill localhost:8100 localhost:8101   \
   --decode localhost:8200 localhost:8201   \
   --port 8000

Support API routes

/v1/completions: get completions request response.
/v1/chat/completions: get chat request response.
/status: get the supported prefill nodes and decode nodes list.
/instances/add: add prefill nodes or decode nodes to the list.
/instances/remove: remove prefill nodes or decode nodes from the list.

Support functions

Support adding prefill nodes or decode nodes at any time.
- If prefill or decode server has been deployed, proxy can add nodes when the proxy is deployed.
- If prefill or decode server deployed after the proxy deployed, server can use /instances/add API to join the proxy server. The prefill server or decode server sends a signal to the proxy server, and the proxy server will check the status of the node util the node is available.
Support removing nodes for the following two situations:
- Support removing nodes when the prefill or decode server failed more than a certain number of times.
- Support using /instances/remove API to delete the node from the proxy server.
Support elastic scaling.
- When the current node is unavailable, the proxy server will schedule to the next available node.

Does this PR introduce any user-facing change?

None

How was this patch tested?

Deploy the proxy server and get request response:

/status

{"prefill_node_count":x,"decode_node_count":x,"prefill_nodes":[xx.xx.xx.xx:xxxx],"decode_nodes":[xx.xx.xx.xx:xxxx]}

/instance/add

Case 1: If the node is not available, the server will waiting for the node to be available:

{"message":"Waiting for prefill_instance xx.xx.xx.xx:xxxx to start."}

Case 2: If the node is available, try to add the node to the server:

{"message":"Added xx.xx.xx.xx:xxxx to prefill_instances."}

/instance/remove

Case 1: If the node is in the corresponding nodes list:

{"message":"Removed xx.xx.xx.xx:xxxx from prefill_instances."}

Case 2: If the node is not in the corresponding nodes list:

{"message": f"Instance xx.xx.xx.xx:xxxx is not in the prefill_instances."}

vLLM version: v0.11.0.rc0
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0.rc3

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-11-28T09:28:04Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces an elastic proxy for vLLM to support scaling of prefill and decode instances. The implementation uses FastAPI and provides several API endpoints for completions, status checks, and dynamic instance management. While this is a great feature addition, the current implementation has several critical issues that need to be addressed. These include a blocking call in the server's __init__, incorrect logic for handling instance removal and health checks, and fundamental flaws in handling streaming responses which will cause requests to fail. Additionally, there's a bug in route registration for adding new instances. These issues will prevent the proxy from functioning correctly.

examples/elastic_scaling/elastic_proxy.py

CalvinXKY · 2025-11-29T03:58:35Z

Describe the relevant RFC that was added, and include the corresponding test output in the description.

Signed-off-by: yuxinshan <[email protected]> Signed-off-by: CalvinXKY <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Nov 28, 2025

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

yuxinshan force-pushed the elastic_proxy branch from 4638113 to e6ffb13 Compare December 1, 2025 06:43

add elastic proxy

a8d0cbc

Signed-off-by: yuxinshan <[email protected]> Signed-off-by: CalvinXKY <[email protected]>

yuxinshan force-pushed the elastic_proxy branch from e6ffb13 to a8d0cbc Compare December 1, 2025 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] add elastic proxy #4545

[feat] add elastic proxy #4545

Uh oh!

yuxinshan commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CalvinXKY commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[feat] add elastic proxy #4545

Are you sure you want to change the base?

[feat] add elastic proxy #4545

Uh oh!

Conversation

yuxinshan commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Support API routes

Support functions

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CalvinXKY commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuxinshan commented Nov 28, 2025 •

edited by github-actions bot

Loading