Skip to content

Commit b5f7a83

Browse files
authored
[Doc] Upgrade multi-node doc (#4365)
### What this PR does / why we need it? When we are using `Ascend scheduler`, the param `max_num_batched_tokens` should be larger than `max_model_len`, otherwise, will encountered the follow error: ```shell Value error, Ascend scheduler is enabled without chunked prefill feature. Argument max_num_batched_tokens (4096) is smaller than max_model_len (32768). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len. [type=value_error, input_value=ArgsKwargs((), {'model_co...g': {'enabled': True}}}), input_type=ArgsKwargs] ``` ### Does this PR introduce _any_ user-facing change? Users/Developers who running the model according to the [tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html), the parameters can be specified correctly. ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b --------- Signed-off-by: wangli <[email protected]>
1 parent b34f195 commit b5f7a83

File tree

3 files changed

+9
-9
lines changed

3 files changed

+9
-9
lines changed

.github/workflows/_e2e_nightly_multi_node.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ jobs:
216216
217217
# 1) check follower pods
218218
ALL_FOLLOWERS_READY=true
219-
for ((i=1; i<${SIZE}; i++)); do
219+
for ((i=1; i<SIZE; i++)); do
220220
POD="${POD_PREFIX}-${i}"
221221
PHASE=$(kubectl get pod "$POD" -n "$NAMESPACE" -o jsonpath='{.status.phase}' 2>/dev/null || echo "NotFound")
222222
READY=$(kubectl get pod "$POD" -n "$NAMESPACE" -o jsonpath='{.status.containerStatuses[*].ready}' 2>/dev/null)

docs/source/tutorials/multi_node.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -131,9 +131,9 @@ vllm serve vllm-ascend/DeepSeek-V3.1-W8A8 \
131131
--served-model-name deepseek_v3.1 \
132132
--enable-expert-parallel \
133133
--max-num-seqs 16 \
134-
--max-model-len 32768 \
134+
--max-model-len 8192 \
135135
--quantization ascend \
136-
--max-num-batched-tokens 4096 \
136+
--max-num-batched-tokens 8192 \
137137
--trust-remote-code \
138138
--no-enable-prefix-caching \
139139
--gpu-memory-utilization 0.9 \
@@ -176,8 +176,8 @@ vllm serve vllm-ascend/DeepSeek-V3.1-W8A8 \
176176
--quantization ascend \
177177
--served-model-name deepseek_v3.1 \
178178
--max-num-seqs 16 \
179-
--max-model-len 32768 \
180-
--max-num-batched-tokens 4096 \
179+
--max-model-len 8192 \
180+
--max-num-batched-tokens 8192 \
181181
--enable-expert-parallel \
182182
--trust-remote-code \
183183
--no-enable-prefix-caching \

docs/source/tutorials/multi_node_kimi.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,8 @@ vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \
8888
--tensor-parallel-size 8 \
8989
--enable-expert-parallel \
9090
--max-num-seqs 16 \
91-
--max-model-len 32768 \
92-
--max-num-batched-tokens 4096 \
91+
--max-model-len 8192 \
92+
--max-num-batched-tokens 8192 \
9393
--trust-remote-code \
9494
--no-enable-prefix-caching \
9595
--gpu-memory-utilization 0.9 \
@@ -130,9 +130,9 @@ vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \
130130
--tensor-parallel-size 8 \
131131
--served-model-name kimi \
132132
--max-num-seqs 16 \
133-
--max-model-len 32768 \
133+
--max-model-len 8192 \
134134
--quantization ascend \
135-
--max-num-batched-tokens 4096 \
135+
--max-num-batched-tokens 8192 \
136136
--enable-expert-parallel \
137137
--trust-remote-code \
138138
--no-enable-prefix-caching \

0 commit comments

Comments
 (0)