perf: optimize LogprobsProcessor._update_prompt_logprobs iteration

Mohammad Othman · Mohammad Othman · commit 0d2ed9422ab1 · 2025-11-16T01:08:32.000+02:00
This commit optimizes the loop iteration in LogprobsProcessor._update_prompt_logprobs() to reduce redundant operations and improve performance.

**Problem:**
The original implementation used range-based iteration with repeated list indexing:
- token_ids[pos], prompt_logprobs[pos], prompt_token_ranks[pos] required 3 index operations per iteration
- offset and offset_end were calculated even when decoded_tokens was None
- Less Pythonic code structure

**Solution:**
1. Use enumerate(zip(...)) to iterate through lists directly, eliminating repeated indexing
2. Only calculate offset when decoded_tokens is not None
3. Remove unnecessary offset_end variable

**Performance Impact:**
- Eliminates 3 list index operations per iteration
- Avoids arithmetic when decoded_tokens is None
- Micro-optimization that adds up for large num_prompt_tokens

Signed-off-by: Mohammad Othman &lt;Mo@MohammadOthman.com&gt;
diff --git a/vllm/v1/engine/logprobs.py b/vllm/v1/engine/logprobs.py
@@ -138,21 +138,23 @@ def _update_prompt_logprobs(
         token_ids = token_ids.tolist()
 
         # Make Logprob for each position.
-        for pos in range(num_prompt_tokens):
+        for pos, (token_ids_at_pos, logprobs_at_pos, ranks_at_pos) in enumerate(
+            zip(token_ids, prompt_logprobs, prompt_token_ranks)
+        ):
             # Handle flattening.
-            offset = pos * num_logprobs
-            offset_end = offset + num_logprobs
-            decoded_tokens_for_pos = (
-                NONES if decoded_tokens is None else decoded_tokens[offset:offset_end]
-            )
+            if decoded_tokens is None:
+                decoded_tokens_for_pos = NONES
+            else:
+                offset = pos * num_logprobs
+                decoded_tokens_for_pos = decoded_tokens[offset : offset + num_logprobs]
 
             # Update with the Logprob container for this pos.
             append_logprobs_for_next_position(
                 self.prompt_logprobs,
-                token_ids[pos],
-                prompt_logprobs[pos],
+                token_ids_at_pos,
+                logprobs_at_pos,
                 decoded_tokens_for_pos,
-                prompt_token_ranks[pos],
+                ranks_at_pos,
                 self.num_prompt_logprobs,
             )
 
@@ -179,4 +181,4 @@ def update_from_output(self, output: EngineCoreOutput) -> None:
         if output.new_logprobs is not None:
             self._update_sample_logprobs(output.new_logprobs)
         if output.new_prompt_logprobs_tensors is not None:
-            self._update_prompt_logprobs(output.new_prompt_logprobs_tensors)
+            self._update_prompt_logprobs(output.new_prompt_logprobs_tensors)