Release vllm-stack-0.1.4 · vllm-project/production-stack

The stack deployment of vLLM

What's changed

Adding support to route a request to a specific engine instance (#438) @dumb0002
[Perf] Improve disaggregated prefill router performance (#440) @YuhanLiu11
[Fix] Only the default namespace service monitor namespace (#447) @nicole-lihui
update install script kubectl command to find kuberay-operator pod globally (#460) @googs1025
[Doc] Adding documentation for disaggregated prefill (#477) @YuhanLiu11
Optimize port conversion (#466) @learner0810
[Misc] Making KV aware routing compatible with latest LMCache (#475) @YuhanLiu11
fix(operator): fix cr status base on deployment replicas (#443) @googs1025
[Misc] Update the request_id handling logic to align with vLLM (#473) @KevinCheung2259
[CI/Build] Add env clean up before run (#486) @Shaoting-Feng
[BugFix] Fix v1/models in static discovery (#492) @zerofishnoodles
Bugfix/482 helm rayspec fix (#483) @insukim1994