Commit 6d17974
[Perf] Optimize multi-token incremental detokenization
This commit optimizes the incremental detokenization process when
handling multiple tokens per update, which is increasingly common with
speculative decoding methods (EAGLE, Medusa, n-gram proposer, etc.).
**Problem:**
The original implementation in BaseIncrementalDetokenizer.update()
processed tokens one-by-one in a loop, calling decode_next() for each
token. This created inefficiency when speculative decoding generates
multiple tokens per step (up to 128 tokens in MAX_SPEC_LEN scenarios).
For SlowIncrementalDetokenizer, this was particularly inefficient as
each decode_next() call invoked detokenize_incrementally() with the
full token list, creating O(n) work per token for n tokens total.
**Solution:**
1. Refactored BaseIncrementalDetokenizer.update() to batch-process
tokens when possible, using a new _decode_tokens_batch() method.
2. Special handling for min_tokens edge case: when crossing the
min_tokens threshold during a batch, falls back to one-by-one
processing to accurately track stop_check_offset for stop string
detection.
3. Added SlowIncrementalDetokenizer._decode_tokens_batch() override
that processes tokens more efficiently while maintaining correct
incremental state updates.
4. FastIncrementalDetokenizer continues to use the default
implementation (calling decode_next per token) since DecodeStream
requires per-token state updates.
Fixes TODO in vllm/v1/engine/detokenizer.py:115-116
Signed-off-by: Mohammad Othman <[email protected]>1 parent 2bb4435 commit 6d17974
1 file changed
+66
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
115 | | - | |
116 | | - | |
| 115 | + | |
117 | 116 | | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
122 | 141 | | |
123 | 142 | | |
124 | 143 | | |
| |||
142 | 161 | | |
143 | 162 | | |
144 | 163 | | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
145 | 175 | | |
146 | 176 | | |
147 | 177 | | |
| |||
312 | 342 | | |
313 | 343 | | |
314 | 344 | | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
315 | 375 | | |
316 | 376 | | |
317 | 377 | | |
| |||
0 commit comments