Commit 9868268
[v0.11.0][Fix] Cap max tokens to prevent potential OOM (vllm-project#3720) (vllm-project#3744)
### What this PR does / why we need it?
Caps the calculated maximum number of tokens at 512.
This prevents allocating an excessively large buffer when a cudagraph
capture size is not specified, mitigating the risk of out-of-memory
errors.
### Does this PR introduce _any_ user-facing change?
None.
### How was this patch tested?
None.
Signed-off-by: Yizhou Liu <[email protected]>1 parent 9191728 commit 9868268
1 file changed
+3
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
543 | 543 | | |
544 | 544 | | |
545 | 545 | | |
546 | | - | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
547 | 549 | | |
548 | 550 | | |
549 | 551 | | |
| |||
0 commit comments