Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 #27420

Flink-ddd · 2025-10-23T15:02:35Z

Purpose

This PR addresses a bug where llm.chat() pollutes the scheduler queue if a batch of requests fails validation mid-way.

The Bug:
When a batch (e.g., 4 requests) is sent to llm.chat() and one request (3) is invalid (too long), the engine enqueues the valid requests (1, 2) and then throws an exception for 3. These enqueued requests (1, 2) become "orphaned" and are not cleaned up.

The Consequence:
The next call to llm.chat() (in a recursive retry logic) picks up these orphaned requests, leading to more outputs than inputs and breaking the API contract.

The Fix:
This PR makes the batch addition in _validate_and_add_requests transactional.

It wraps the request-adding loop in a try...except block.
It tracks the request_id of each successfully enqueued request from the current batch.
If an exception occurs, the except block iterates through the tracked IDs and calls self.llm_engine.abort_request() on each "orphan" before re-raising the exception.

Test Plan

Manual Reproduction (OP's Example): Used the recursive infer() function from [Bug]: .chat() does not clean up in case of validation failure #26081 to demonstrate the bug (orphaned requests causing incorrect output counts) and verify the fix (correct output counts). See results below.
Automated Regression Test: Added test_chat_batch_failure_cleanup to tests/entrypoints/llm/test_chat.py. This test (a) triggers a batch failure with an invalid prompt, (b) immediately sends a new, valid batch, and (c) asserts that the output count for the new batch is correct, confirming the queue was cleaned.

Test Result

The fix is confirmed by both the manual reproduction script and the new automated test.

1. Manual Reproduction (Before vs. After)

Before Fix (on main branch): Bug Reproduced
The recursive call processed 5 real outputs instead of the expected 3, due to 2 orphaned requests.

After Fix (on this branch): Fix Verified
The recursive call now correctly processes 3 real outputs and 1 error, as the orphaned requests were successfully aborted.

2. Automated Regression Test (Pytest)

The new pytest case passes, ensuring this bug does not regress.

commands:
pytest tests/entrypoints/llm/test_chat.py -k test_chat_batch_failure_cleanup

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-02T05:06:09Z

vllm/entrypoints/llm.py

+        try:
+            for i, prompt in enumerate(it):
+                if isinstance(prompt, dict):
+                    self._validate_mm_data_and_uuids(
+                        prompt.get("multi_modal_data"), prompt.get("multi_modal_uuids")
+                    )
+                    request_id = self._add_request(
+                        prompt,
+                        params[i] if isinstance(params, Sequence) else params,
+                        lora_request=lora_request[i]
+                        if isinstance(lora_request, Sequence)
+                        else lora_request,
+                        priority=priority[i] if priority else 0,
+                    )
+                    added_request_ids.append(request_id)


Skip adding non-dict prompts in _validate_and_add_requests

The new try/except only calls _add_request inside the isinstance(prompt, dict) branch. Any prompt that is a string, list of tokens, etc. now bypasses _add_request entirely. For LLM.generate, LLM.embed, LLM.encode, and any other path that passes plain text prompts, this method returns without enqueuing anything, so _run_engine sees zero requests and the call produces no output. This is a regression from previous behavior and breaks the core API for all non-dict prompts.

Useful? React with 👍 / 👎.

DarkLight1337 · 2025-11-02T06:45:26Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a fix to prevent orphaned requests in the scheduler queue when a batch of requests to llm.chat() fails mid-validation. The solution makes the batch addition transactional by wrapping the request-adding loop in a try...except block and aborting any successfully enqueued requests from the batch if an exception occurs. While the fix is logically sound and is accompanied by a good regression test, it introduces a critical regression. The change incorrectly moves the request addition logic inside a conditional block that only handles dictionary-based prompts, causing non-dictionary prompts (like simple strings) to be silently ignored.

gemini-code-assist · 2025-11-02T06:46:26Z

vllm/entrypoints/llm.py

+                if isinstance(prompt, dict):
+                    self._validate_mm_data_and_uuids(
+                        prompt.get("multi_modal_data"), prompt.get("multi_modal_uuids")
+                    )
+                    request_id = self._add_request(
+                        prompt,
+                        params[i] if isinstance(params, Sequence) else params,
+                        lora_request=lora_request[i]
+                        if isinstance(lora_request, Sequence)
+                        else lora_request,
+                        priority=priority[i] if priority else 0,
+                    )
+                    added_request_ids.append(request_id)


This change introduces a regression. The call to _add_request and appending to added_request_ids has been moved inside the if isinstance(prompt, dict): block. This means that prompts that are not dictionaries (e.g., simple strings passed to llm.generate()) will be silently ignored instead of being added to the engine, breaking existing functionality.

The _add_request call and the append to added_request_ids should be moved outside the if block to process all prompt types, restoring the original behavior while keeping the transactional logic.

Suggested change

if isinstance(prompt, dict):

self._validate_mm_data_and_uuids(

prompt.get("multi_modal_data"), prompt.get("multi_modal_uuids")

)

request_id = self._add_request(

prompt,

params[i] if isinstance(params, Sequence) else params,

lora_request=lora_request[i]

if isinstance(lora_request, Sequence)

else lora_request,

priority=priority[i] if priority else 0,

)

added_request_ids.append(request_id)

if isinstance(prompt, dict):

self._validate_mm_data_and_uuids(

prompt.get("multi_modal_data"), prompt.get("multi_modal_uuids")

)

request_id = self._add_request(

prompt,

params[i] if isinstance(params, Sequence) else params,

lora_request=lora_request[i]

if isinstance(lora_request, Sequence)

else lora_request,

priority=priority[i] if priority else 0,

)

added_request_ids.append(request_id)

Please address this

Hi @DarkLight1337 , I just saw message, okay, I'm making changes now. Once it's tested on the GPU and works fine, I'll push the code.

Hi @DarkLight1337 , I just update code logic and push it. The screenshots below show the execution of several methods, including llm.chat, llm.generate, and llm.embed.

…-project#26081 Signed-off-by: vensenmu <[email protected]>

DarkLight1337 · 2025-11-02T12:17:09Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a crucial fix to prevent orphaned requests from polluting the scheduler queue when a batch operation in llm.chat() fails. The approach of making the batch addition transactional by tracking added requests and aborting them on failure is sound and well-implemented. The accompanying regression test is also well-designed and effectively verifies the fix.

I've identified one area for improvement in the new exception handling logic to make it more robust, ensuring that the original exception is not lost if the cleanup operation itself fails. Overall, this is a great contribution that improves the reliability of batch processing.

gemini-code-assist · 2025-11-02T12:18:32Z

vllm/entrypoints/llm.py

+        except Exception as e:
+            if added_request_ids:
+                self.llm_engine.abort_request(added_request_ids)
+            raise e


The current exception handling in this except block has a potential issue. If self.llm_engine.abort_request(added_request_ids) itself raises an exception, the original exception e that triggered this cleanup logic will be lost. The new exception from abort_request will propagate, masking the root cause of the failure.

To make the error handling more robust, it's better to wrap the cleanup call in its own try...except block. This ensures that even if the cleanup fails, the original, more important exception is the one that gets re-raised. The cleanup failure can be logged for debugging purposes.

Also, it's a common Python best practice to use a bare raise statement inside an except block to re-raise the original exception, as it preserves the full traceback.

Suggested change

except Exception as e:

if added_request_ids:

self.llm_engine.abort_request(added_request_ids)

raise e

except Exception:

if added_request_ids:

try:

self.llm_engine.abort_request(added_request_ids)

except Exception as abort_exc:

logger.warning(

"Failed to abort orphaned requests after an error during "

"batch addition. The original error will be raised.",

exc_info=abort_exc)

raise

DarkLight1337

LGTM now, thanks for fixing!

Flink-ddd · 2025-11-02T13:16:48Z

Hi @DarkLight1337 , Thanks for your approval and help.

mergify bot added the frontend label Oct 23, 2025

Flink-ddd force-pushed the fix/26081-chat-batch-cleanup branch 4 times, most recently from 31a6f39 to bd01420 Compare October 23, 2025 15:43

Flink-ddd marked this pull request as ready for review November 2, 2025 05:04

Flink-ddd requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and simon-mo as code owners November 2, 2025 05:04

chatgpt-codex-connector bot reviewed Nov 2, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 2, 2025

View reviewed changes

Fix(llm): Clean up enqueued requests on batch chat failure Fixes vllm…

b2114f6

…-project#26081 Signed-off-by: vensenmu <[email protected]>

Flink-ddd force-pushed the fix/26081-chat-batch-cleanup branch from bd01420 to b2114f6 Compare November 2, 2025 12:07

gemini-code-assist bot reviewed Nov 2, 2025

View reviewed changes

DarkLight1337 approved these changes Nov 2, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 2, 2025 12:19

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 2, 2025

Merge branch 'main' into fix/26081-chat-batch-cleanup

c1ff9ae

DarkLight1337 merged commit 0ce743f into vllm-project:main Nov 2, 2025
47 checks passed

-        except Exception as e:
-            if added_request_ids:
-                self.llm_engine.abort_request(added_request_ids)
-            raise e
+        except Exception:
+            if added_request_ids:
+                try:
+                    self.llm_engine.abort_request(added_request_ids)
+                except Exception as abort_exc:
+                    logger.warning(
+                        "Failed to abort orphaned requests after an error during "
+                        "batch addition. The original error will be raised.",
+                        exc_info=abort_exc)
+            raise

Uh oh!

Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 #27420

Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 #27420

Uh oh!

Conversation

Flink-ddd commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

1. Manual Reproduction (Before vs. After)

2. Automated Regression Test (Pytest)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Flink-ddd Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Flink-ddd Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Flink-ddd commented Nov 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Flink-ddd commented Oct 23, 2025 •

edited by github-actions bot

Loading