[CI]: reduce HTTP calls inside entrypoints openai tests #23646

AzizCode92 · 2025-08-26T10:39:40Z

Purpose

Previously, tests would repeatedly make HTTP calls to HF_HUB at different test modules even when the lora model and modules are downloaded.
As this would slow down our CI pipeline and increase the risk of network failures, I propose this PR.

This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant HTTP calls and makes the CI pipeline faster and more reliable.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures. This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable. Signed-off-by: AzizCode92 <[email protected]>

Signed-off-by: AzizCode92 <[email protected]>

njhill · 2025-08-26T17:10:40Z

Thanks @AzizCode92! Are you in vllm slack? We have a channel #ci-sprint where we've just started to coordinate these efforts.

njhill · 2025-08-26T17:12:35Z

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

jeejeelee · 2025-08-27T01:55:21Z

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

Makes sense

AzizCode92 · 2025-08-27T04:28:52Z

Thanks @AzizCode92! Are you in vllm slack? We have a channel #ci-sprint where we've just started to coordinate these efforts.

Hi @njhill, no I just sent a request to join it.

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

Thanks for clarifying. Just to confirm my understanding, you're suggesting we consolidate LoRA-specific tests into tests/entrypoints/openai/test_lora_adapters.py and remove them from other test files to avoid redundancy.

For instance, tests/entrypoints/openai/test_chat.py currently tests the HuggingFaceH4/zephyr-7b-beta model with LoRA modules. Since this functionality is already covered in the dedicated test_lora_adapters.py suite, I can proceed with removing that specific test case from test_chat.py + ofc replacing HuggingFaceH4/zephyr-7b-beta with a tiny model from #23456 (comment).

Does this align with what you had in mind?

robertgshaw2-redhat · 2025-08-27T04:36:39Z

Thanks @AzizCode92! Are you in vllm slack? We have a channel #ci-sprint where we've just started to coordinate these efforts.

Hi @njhill, no I just sent a request to join it.

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

Thanks for clarifying. Just to confirm my understanding, you're suggesting we consolidate LoRA-specific tests into tests/entrypoints/openai/test_lora_adapters.py and remove them from other test files to avoid redundancy.

For instance, tests/entrypoints/openai/test_chat.py currently tests the HuggingFaceH4/zephyr-7b-beta model with LoRA modules. Since this functionality is already covered in the dedicated test_lora_adapters.py suite, I can proceed with removing that specific test case from test_chat.py + ofc replacing HuggingFaceH4/zephyr-7b-beta with a tiny model from #23456 (comment).

Does this align with what you had in mind?

yep --- we need to be careful that we aren't removing some coverage, but ideally there should be a single file for LoRA. Then we can have a single shared common fixture for the openai tests that that uses a tiny model. Removing LoRA should help a lot with the sharing across the other test groups

jeejeelee · 2025-08-27T06:21:03Z

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures.
This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable.

IIUC, LoRA weight should be the same as the model - if it's already downloaded, it won't download again repeatedly, so this PR won't work. See : https://buildkite.com/vllm/ci/builds/28468#0198e9bd-43ae-42d6-8710-12986ab9b173/210

@DarkLight1337 If I am wrong, please correct me

DarkLight1337 · 2025-08-27T06:27:18Z

I think this PR can still help by reducing the number of HTTP requests to check the updated status of each file, even though weights won't be re-downloaded

AzizCode92 · 2025-08-27T07:49:02Z

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures.
This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable.

IIUC, LoRA weight should be the same as the model - if it's already downloaded, it won't download again repeatedly, so this PR won't work. See : https://buildkite.com/vllm/ci/builds/28468#0198e9bd-43ae-42d6-8710-12986ab9b173/210

@DarkLight1337 If I am wrong, please correct me

Thanks for the feedback.
As @DarkLight1337 pointed, this PR will minimize the HTTP calls for our tests. Although its a minor improvement but I think still necessary to improve the speed of the CI pipeline.
snapshot_download() will make these HTTP calls even when the files are already cached. So I thought we get rid of this call from the modules level and make a session fixture for it to make it happen once for all tests.

njhill · 2025-08-27T17:35:23Z

I think this PR can still help by reducing the number of HTTP requests to check the updated status of each file, even though weights won't be re-downloaded

We are hoping to eliminate all of the http calls globally anyhow #23451

AzizCode92 · 2025-08-27T18:02:54Z

Once the tiny lora model is ready, it will replace the zephyr model and its lora modules.
#23456 (comment)

hmellor · 2025-08-28T00:48:04Z

tests/entrypoints/conftest.py

+
+@pytest.fixture(scope="session")
+def pa_files():
+    """Download PA files once per test session."""


Do we still use the prompt adapter files? This feature was removed a while ago.

Absolutelty! I will remove it

jeejeelee · 2025-08-28T01:52:04Z

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures.
This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable.

IIUC, LoRA weight should be the same as the model - if it's already downloaded, it won't download again repeatedly, so this PR won't work. See : https://buildkite.com/vllm/ci/builds/28468#0198e9bd-43ae-42d6-8710-12986ab9b173/210
@DarkLight1337 If I am wrong, please correct me

Thanks for the feedback. As @DarkLight1337 pointed, this PR will minimize the HTTP calls for our tests. Although its a minor improvement but I think still necessary to improve the speed of the CI pipeline. snapshot_download() will make these HTTP calls even when the files are already cached. So I thought we get rid of this call from the modules level and make a session fixture for it to make it happen once for all tests.

That makes sense, but what if you set local_files_only=True?
If it's to reduce HTTP calls, please modify your PR title.

Signed-off-by: AzizCode92 <[email protected]>

jeejeelee · 2025-08-29T06:39:52Z

I have removed tests/entrypoints/llm/test_generate_multiple_loras.py, please resolve the branch conflicit ,then overall LGTM

Signed-off-by: AzizCode92 <[email protected]>

jeejeelee

overall LGTM

@DarkLight1337 @hmellor please take another look, thank you

hmellor

Can we also remove the instances of LORA_NAME in all the files where zephyr_lora_files has been removed? It doesn't appear to be used anywhere but could confuse future maintainers.

Signed-off-by: AzizCode92 <[email protected]>

tests/entrypoints/conftest.py

Fix: Re-enable remote model download for tests Co-authored-by: Harry Mellor <[email protected]> Signed-off-by: Aziz <[email protected]>

…#23646) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Aziz <[email protected]> Co-authored-by: Harry Mellor <[email protected]>

AzizCode92 added 2 commits August 26, 2025 12:33

fix: import statements

0142c26

Signed-off-by: AzizCode92 <[email protected]>

AzizCode92 marked this pull request as ready for review August 26, 2025 11:19

AzizCode92 requested review from DarkLight1337, aarnphm, robertgshaw2-redhat and simon-mo as code owners August 26, 2025 11:19

Merge branch 'main' into optimize-ci-lora-downloads

94d50f7

DarkLight1337 requested a review from jeejeelee August 26, 2025 13:37

njhill mentioned this pull request Aug 26, 2025

[CI]: Entrypoints tests cleanup #23667

Open

hmellor reviewed Aug 28, 2025

View reviewed changes

fix: incorporate pr feedback

955d59d

Signed-off-by: AzizCode92 <[email protected]>

AzizCode92 changed the title ~~[test]: download model and their loras only once per test session~~ [CI]: reduce HTTP calls inside entrypoints openai tests Aug 28, 2025

Merge branch 'main' into optimize-ci-lora-downloads

eb7105a

Signed-off-by: AzizCode92 <[email protected]>

jeejeelee approved these changes Aug 29, 2025

View reviewed changes

hmellor reviewed Sep 1, 2025

View reviewed changes

fix: remove unused variable LOAR_NAME

d192668

Signed-off-by: AzizCode92 <[email protected]>

hmellor reviewed Sep 1, 2025

View reviewed changes

tests/entrypoints/conftest.py Outdated Show resolved Hide resolved

AzizCode92 and others added 2 commits September 1, 2025 20:48

Update tests/entrypoints/conftest.py

f8e196b

Fix: Re-enable remote model download for tests Co-authored-by: Harry Mellor <[email protected]> Signed-off-by: Aziz <[email protected]>

Merge branch 'main' into optimize-ci-lora-downloads

bbcc23c

hmellor approved these changes Sep 2, 2025

View reviewed changes

hmellor enabled auto-merge (squash) September 2, 2025 09:01

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025

hmellor merged commit ce30dca into vllm-project:main Sep 2, 2025
25 checks passed

AzizCode92 deleted the optimize-ci-lora-downloads branch September 2, 2025 13:55

csahithi mentioned this pull request Sep 16, 2025

[CI] Optimize entrypoints API server tests #23896

Open

5 tasks

Uh oh!

[CI]: reduce HTTP calls inside entrypoints openai tests #23646

[CI]: reduce HTTP calls inside entrypoints openai tests #23646

Uh oh!

Conversation

AzizCode92 commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

njhill commented Aug 26, 2025

Uh oh!

njhill commented Aug 26, 2025

Uh oh!

jeejeelee commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AzizCode92 commented Aug 27, 2025

Uh oh!

robertgshaw2-redhat commented Aug 27, 2025

Uh oh!

jeejeelee commented Aug 27, 2025

Uh oh!

DarkLight1337 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AzizCode92 commented Aug 27, 2025 • edited by jeejeelee Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Aug 27, 2025

Uh oh!

AzizCode92 commented Aug 27, 2025

Uh oh!

hmellor Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

AzizCode92 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeejeelee commented Aug 29, 2025

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

AzizCode92 commented Aug 26, 2025 •

edited by github-actions bot

Loading

jeejeelee commented Aug 27, 2025 •

edited

Loading

DarkLight1337 commented Aug 27, 2025 •

edited

Loading

AzizCode92 commented Aug 27, 2025 •

edited by jeejeelee

Loading

jeejeelee commented Aug 28, 2025 •

edited

Loading