Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tests/v1/engine/test_async_llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ async def test_multi_abort(output_kind: RequestOutputKind):
)

# Let requests start
await asyncio.sleep(0.5)
await asyncio.sleep(1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While increasing the sleep duration might mitigate the flakiness, it's not a deterministic fix and can make the test suite slower. A more robust approach would be to wait for a signal that the requests to be aborted have actually started processing in the engine. This would avoid race conditions and make the test more reliable.

One way to achieve this is to modify the generate test helper to signal an asyncio.Event after it receives the first output chunk for a request. The test can then wait on these events before calling abort.

Here's an example of how you could modify the generate function and this test:

1. Modify generate to accept and signal an event:

async def generate(
    # ... other args
    started_event: asyncio.Event | None = None,
) -> tuple[int, str]:
    # ...
    first_chunk = True
    async for out in engine.generate(...):
        if first_chunk:
            if started_event:
                started_event.set()
            first_chunk = False
        # ... rest of the function

2. Use the event in test_multi_abort:

        # ...
        tasks: list[asyncio.Task] = []
        started_events = []
        abort_request_indices = set(REQUEST_IDS_TO_ABORT)
        for idx, request_id in enumerate(request_ids):
            # ...
            event = None
            if idx in abort_request_indices:
                event = asyncio.Event()
                started_events.append(event)

            tasks.append(
                asyncio.create_task(
                    generate(
                        # ... args
                        started_event=event
                    )
                )
            )

        # Wait for requests to start instead of sleeping
        await asyncio.gather(*(event.wait() for event in started_events))

        # Use multi-abort to abort multiple requests at once
        # ...

This approach ensures that we only attempt to abort requests that are confirmed to be running, making the test more reliable and independent of timing. Since this is a common pattern in the test file, applying this change to other tests that use asyncio.sleep for synchronization (like test_abort_final_output) would also improve their robustness.


# Use multi-abort to abort multiple requests at once
abort_request_ids = [request_ids[i] for i in REQUEST_IDS_TO_ABORT]
Expand Down
Loading