vllm-stack-0.1.2
The stack deployment of vLLM
What's Changed
- [Feat] Adding support to turn on/off engine deployment by @dumb0002 #311
- [Feat] Add nodeSelectorTerms for router & cacher servers by @kinoute #314
- [Bugfix] Update logger handler to handle stdout/stderr properly @corona10 #320
- [CI] Always upload logs of Helm functionality checks @pwuersch #321
- [CI/Build] Remove sudo requirements in CI/CD @Shaoting-Feng #325
- [Feat] Multiple service creation when multiple models specified @lucas-tucker #326
- [CI] Add coverage tracking @zhuohangu #330
- [CLI/Doc]Update on gke deployment with gpu quota @EaminC #334
- [Bugfix] Fix thread creation to pass parameters properly. @corona10 #336
- [Feat] OpenTelemetry Support Example @lucas-tucker #346
- [Feat] Tool calling support for MCP client integration @YuhanLiu11 #352
- [Benchmark] Add api key option @Kimdongui #354
- [Bugfix] fix init container pvc volume mount @zerofishnoodles #359
- [Feat] Enabled latency monitor and added average latency computation logic @insukim1994 #362
- [Feat] Added a tutorial document for deploying production stack on amd gpus @insukim1994 #364
- [Bugfix] Deprecated least loaded routing logic @insukim1994 #366
- [Bugfix] added model name to deployment selector @TamKej #367
- [Feat] helm: add routerSpec.serviceType value @marquiz #368
- [Feat] Support Multi-Model Deployment with Enhanced vLLM Configurations @haitwang-cloud #371
- [Bugfix] Fixing issues on the engine svc labels @dumb0002 #376
- [Bugfix] Declare logger properly for protocols.py @corona10 #381
- [Feat] Adding a tutorial for using vLLM v1 in production stack @YuhanLiu11 #390