You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### What this PR does / why we need it?
For multi-node CI system, we need to ensure that cluster resources meet
the expected specifications before conducting multi-node
interoperability tests. Otherwise, unexpected errors may occur (for
example, we might mistakenly assume all nodes are ready and perform a
global cluster IP acquisition, which would cause an exception to be
thrown in Python because some nodes might not actually be ready at that
point). Therefore, we need to wait at the workflow level until all
resources meet the expected specifications.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b
---------
Signed-off-by: wangli <[email protected]>
0 commit comments