The stack deployment of vLLM
What's Changed
- [Feat] Add GKE example for lmcache cpu ram + local disk offloading by @dannawang0221 in #678
- [Feat] Use the lmcache 0.3.5 for kvaware routing by @zerofishnoodles in #673
- [Feat]: add pull policy option to to ray-cluster.yaml (helm chart) by @moriabs88 in #686
- [Feat] Add support for scaling down to zero in KEDA by @Romero027 in #679
- [bugfix] Small fix to observability tutorial by @Romero027 in #695
- [Feat][Router]: add vision model type by @max-wittig in #603
- Adding Support for Sleep Mode for vLLM Container without Command Args by @dumb0002 in #696
- [Bugfix] Increase liveness failure threshold for crd by @zerofishnoodles in #688
- [bugfix] Add close method for static discovery by @zerofishnoodles in #692
- [Bugfix][Router]: loop through model_names by @max-wittig in #694
- [Misc] bump up otel col version and use a simplified image by @JaredTan95 in #698
- [vllm-router] fall back to remote tokenizer as 2nd path by @panpan0000 in #702
- [Bugfix][Router]: do not filter by model label in transcription by @max-wittig in #712
- [CI] move e2e machine to self hosted by @zerofishnoodles in #716
- [Feat] Add Production-ready vLLM EKS terraform stack tutorial by @brokedba in #704
- [bugfix] Add annotation to pod after loading the lora adapter to trigger the modify event by @zerofishnoodles in #703
- [Feat] [Router] [Misc] [Doc] increased configurability of affinity and probes by @Garrukh in #715
- [Bugfix] fix pd client initialization issue by @zerofishnoodles in #717
- [Bugfix] Update aiohttp to resolve CVE-2024-23334 vulnerability by @ikaadil in #722
- [Bugfix/Feature] Support extraPorts in service-vllm by @NargiT in #725
- Update gateway-inference-extension.rst by @linsun in #728
- feat(helm): Use emptyDir as pvcStorage by @Jimmy-Newtron in #616
- [Bugfix] Support service discovery by service name: add missing role and rolebinding for #586 by @NargiT in #724
- Update doc 04-GCP-GKE-lmcache-local-disk.md by @dannawang0221 in #727
- [Feat] Enable MIG support for Ray Head Node using chart.resources helper by @shima8823 in #732
- [feat] Enable session key in request body by @zerofishnoodles in #741
- [Feat] Add basic integration path for semantic router by @zerofishnoodles in #740
- [Bugfix] Pod rolebinding are requiered even with k8s_discovery_mode=serivce-name by @NargiT in #744
- [Feat] allow annotation on router pod by @NargiT in #743
- [Integration]: Add Intelligent Semantic Routing with vLLM-SR by @Xunzhuo in #750
- [Integration]: Update Docs with vLLM-SR by @Xunzhuo in #752
- [Bugfix] kv aware routing for lmcache 0.3.9 by @zerofishnoodles in #697
- [Feat] Ability to add labels to model pvc by @NargiT in #754
- [Bugfix] Helm: Add security context support, fix #756 by @aplufr in #757
- [Bugfix] lmcache server points to wrong file in entrypoint by @Senne-Mennes in #730
- [Feat] Add per-model runtimeClassName configuration support by @HanFa in #755
- Bumping version to 0.1.8 by @YuhanLiu11 in #738
New Contributors
- @dannawang0221 made their first contribution in #678
- @moriabs88 made their first contribution in #686
- @JaredTan95 made their first contribution in #698
- @panpan0000 made their first contribution in #702
- @brokedba made their first contribution in #704
- @Garrukh made their first contribution in #715
- @NargiT made their first contribution in #725
- @linsun made their first contribution in #728
- @Jimmy-Newtron made their first contribution in #616
- @shima8823 made their first contribution in #732
- @aplufr made their first contribution in #757
- @Senne-Mennes made their first contribution in #730
- @HanFa made their first contribution in #755
Full Changelog: vllm-stack-0.1.7...vllm-stack-0.1.8