From 18e3f5e46785fabf7335c34549496472c2f1f047 Mon Sep 17 00:00:00 2001 From: David Gao Date: Sun, 7 Sep 2025 10:33:44 +0800 Subject: [PATCH 1/3] [Fix] Correct parameter in transcription API tutorial The tutorial was using --static-model-types "transcription" but the actual implementation requires --static-model-labels "transcription" for proper routing of transcription requests. This fixes the documentation mismatch where the tutorial didn't align with the actual code implementation in the routing logic. Signed-off-by: David Gao --- tutorials/23-whisper-api-transcription.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/23-whisper-api-transcription.md b/tutorials/23-whisper-api-transcription.md index a09281916..9bf36344c 100644 --- a/tutorials/23-whisper-api-transcription.md +++ b/tutorials/23-whisper-api-transcription.md @@ -41,7 +41,7 @@ uv run python3 -m vllm_router.app \ --service-discovery static \ --static-backends "$2" \ --static-models "openai/whisper-small" \ - --static-model-types "transcription" \ + --static-model-labels "transcription" \ --routing-logic roundrobin \ --log-stats \ --engine-stats-interval 10 \ From 336db4d71a753c6eaf1ce7fb84ad978b2ab484d7 Mon Sep 17 00:00:00 2001 From: David Gao Date: Sun, 7 Sep 2025 13:13:56 +0800 Subject: [PATCH 2/3] docs: add more comments to explain how to write a router script for whisper backend Signed-off-by: David Gao --- tutorials/23-whisper-api-transcription.md | 24 +++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/tutorials/23-whisper-api-transcription.md b/tutorials/23-whisper-api-transcription.md index 9bf36344c..39951dd2d 100644 --- a/tutorials/23-whisper-api-transcription.md +++ b/tutorials/23-whisper-api-transcription.md @@ -29,6 +29,8 @@ vllm serve \ Create and run a router connected to the Whisper backend: +run-router.sh: + ```bash #!/bin/bash if [[ $# -ne 2 ]]; then @@ -37,21 +39,23 @@ if [[ $# -ne 2 ]]; then fi uv run python3 -m vllm_router.app \ - --host 0.0.0.0 --port "$1" \ - --service-discovery static \ - --static-backends "$2" \ - --static-models "openai/whisper-small" \ - --static-model-labels "transcription" \ - --routing-logic roundrobin \ - --log-stats \ - --engine-stats-interval 10 \ - --request-stats-window 10 + --host 0.0.0.0 --port "$1" \ + --service-discovery static \ + --static-backends "$2" \ + --static-models "openai/whisper-small" \ + --static-model-labels "transcription" \ + --routing-logic roundrobin \ + --log-stats \ + --log-level debug \ # log level: "debug", "info", "warning", "error", "critical" + --engine-stats-interval 10 \ + --request-stats-window 10 + --static-backend-health-checks # Enable this flag to make vllm-router check periodically if the models work by sending dummy requests to their endpoints. ``` Example usage: ```bash -./run-router.sh 8000 http://localhost:8002 +./run-router.sh 8000 http://0.0.0.0:8002 ``` ## 3. Sending a Transcription Request From 5a1063f060d002e81b8c6c16c550cefc3dd5cb9b Mon Sep 17 00:00:00 2001 From: David Gao Date: Thu, 25 Sep 2025 16:35:06 +0800 Subject: [PATCH 3/3] docs: using the correct `--static-model-types "transcription"` param for proper LM-Cache integration Signed-off-by: David Gao --- tutorials/23-whisper-api-transcription.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tutorials/23-whisper-api-transcription.md b/tutorials/23-whisper-api-transcription.md index 39951dd2d..9e6ef4b59 100644 --- a/tutorials/23-whisper-api-transcription.md +++ b/tutorials/23-whisper-api-transcription.md @@ -43,15 +43,18 @@ uv run python3 -m vllm_router.app \ --service-discovery static \ --static-backends "$2" \ --static-models "openai/whisper-small" \ - --static-model-labels "transcription" \ + --static-model-types "transcription" \ --routing-logic roundrobin \ --log-stats \ - --log-level debug \ # log level: "debug", "info", "warning", "error", "critical" + --log-level debug \ --engine-stats-interval 10 \ - --request-stats-window 10 - --static-backend-health-checks # Enable this flag to make vllm-router check periodically if the models work by sending dummy requests to their endpoints. + --request-stats-window 10 \ + --static-backend-health-checks ``` +* `--log-level` options: "debug", "info", "warning", "error", "critical" +* `--static-backend-health-checks`: Enable this flag to make vllm-router check periodically if the models work by sending dummy requests to their endpoints. + Example usage: ```bash