Skip to content

Conversation

@njhill
Copy link
Member

@njhill njhill commented Oct 30, 2025

It is tricky to test this from the outside in a deterministic way that doesn't rely on timing.

Example occurrence: https://buildkite.com/vllm/ci/builds/36825#019a3209-d6cb-4ae2-94a2-4aab39752d23

It is tricky to test this from the outside in a deterministic way that doesn't rely on timing.

Signed-off-by: Nick Hill <[email protected]>
@mergify mergify bot added the v1 label Oct 30, 2025
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a flaky test by increasing a sleep duration. While this may reduce the frequency of the flake, it's not a deterministic solution and can slow down the test suite. My review provides a suggestion for a more robust, event-based synchronization mechanism to replace the fixed sleep, which would eliminate the race condition and improve test reliability. This approach could also be applied to other tests in the file that rely on asyncio.sleep for synchronization.


# Let requests start
await asyncio.sleep(0.5)
await asyncio.sleep(1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While increasing the sleep duration might mitigate the flakiness, it's not a deterministic fix and can make the test suite slower. A more robust approach would be to wait for a signal that the requests to be aborted have actually started processing in the engine. This would avoid race conditions and make the test more reliable.

One way to achieve this is to modify the generate test helper to signal an asyncio.Event after it receives the first output chunk for a request. The test can then wait on these events before calling abort.

Here's an example of how you could modify the generate function and this test:

1. Modify generate to accept and signal an event:

async def generate(
    # ... other args
    started_event: asyncio.Event | None = None,
) -> tuple[int, str]:
    # ...
    first_chunk = True
    async for out in engine.generate(...):
        if first_chunk:
            if started_event:
                started_event.set()
            first_chunk = False
        # ... rest of the function

2. Use the event in test_multi_abort:

        # ...
        tasks: list[asyncio.Task] = []
        started_events = []
        abort_request_indices = set(REQUEST_IDS_TO_ABORT)
        for idx, request_id in enumerate(request_ids):
            # ...
            event = None
            if idx in abort_request_indices:
                event = asyncio.Event()
                started_events.append(event)

            tasks.append(
                asyncio.create_task(
                    generate(
                        # ... args
                        started_event=event
                    )
                )
            )

        # Wait for requests to start instead of sleeping
        await asyncio.gather(*(event.wait() for event in started_events))

        # Use multi-abort to abort multiple requests at once
        # ...

This approach ensures that we only attempt to abort requests that are confirmed to be running, making the test more reliable and independent of timing. Since this is a common pattern in the test file, applying this change to other tests that use asyncio.sleep for synchronization (like test_abort_final_output) would also improve their robustness.

@njhill njhill marked this pull request as draft October 31, 2025 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant