Skip to content

Commit ec98320

Browse files
authored
correct bug to fix the value of max_num_tokens (#3933)
### What this PR does / why we need it? correct bug to fix the value of max_num_tokens - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: zouyida2052 <[email protected]>
1 parent 0b9b6d7 commit ec98320

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm_ascend/torchair/torchair_model_runner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ def _init_mc2_tokens_capacity(self):
117117
# NOTE: To be clear, we need to make sure that during graph capture, the number of
118118
# tokens is less than or equal to mc2_tokens_capacity. According to _set_cudagraph_sizes,
119119
# the max number of tokens in graph is min(max_num_seqs * uniform_decode_query_len, 512).
120-
max_num_tokens = self.parallel_config.tensor_parallel_size
120+
max_num_tokens = self.max_num_reqs * self.uniform_decode_query_len
121121
tp_size = self.parallel_config.tensor_parallel_size
122122
# Use integer arithmetic for ceiling division.
123123
max_graph_batch_size = self.calculate_new_torchair_graph_batch_size(

0 commit comments

Comments
 (0)