vllm-stack-0.1.4
The stack deployment of vLLM
What's changed
- Adding support to route a request to a specific engine instance (#438) @dumb0002
- [Perf] Improve disaggregated prefill router performance (#440) @YuhanLiu11
- [Fix] Only the default namespace service monitor namespace (#447) @nicole-lihui
- update install script kubectl command to find kuberay-operator pod globally (#460) @googs1025
- [Doc] Adding documentation for disaggregated prefill (#477) @YuhanLiu11
- Optimize port conversion (#466) @learner0810
- [Misc] Making KV aware routing compatible with latest LMCache (#475) @YuhanLiu11
- fix(operator): fix cr status base on deployment replicas (#443) @googs1025
- [Misc] Update the request_id handling logic to align with vLLM (#473) @KevinCheung2259
- [CI/Build] Add env clean up before run (#486) @Shaoting-Feng
- [BugFix] Fix v1/models in static discovery (#492) @zerofishnoodles
- Bugfix/482 helm rayspec fix (#483) @insukim1994