Skip to content

Conversation

@AzizCode92
Copy link
Contributor

@AzizCode92 AzizCode92 commented Aug 26, 2025

Purpose

Previously, tests would repeatedly make HTTP calls to HF_HUB at different test modules even when the lora model and modules are downloaded.
As this would slow down our CI pipeline and increase the risk of network failures, I propose this PR.

This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant HTTP calls and makes the CI pipeline faster and more reliable.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures.

This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable.

Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: AzizCode92 <[email protected]>
@AzizCode92 AzizCode92 marked this pull request as ready for review August 26, 2025 11:19
@njhill
Copy link
Member

njhill commented Aug 26, 2025

Thanks @AzizCode92! Are you in vllm slack? We have a channel #ci-sprint where we've just started to coordinate these efforts.

@njhill
Copy link
Member

njhill commented Aug 26, 2025

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

@jeejeelee
Copy link
Collaborator

jeejeelee commented Aug 27, 2025

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

Makes sense

@AzizCode92
Copy link
Contributor Author

Thanks @AzizCode92! Are you in vllm slack? We have a channel #ci-sprint where we've just started to coordinate these efforts.

Hi @njhill, no I just sent a request to join it.

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

Thanks for clarifying. Just to confirm my understanding, you're suggesting we consolidate LoRA-specific tests into tests/entrypoints/openai/test_lora_adapters.py and remove them from other test files to avoid redundancy.

For instance, tests/entrypoints/openai/test_chat.py currently tests the HuggingFaceH4/zephyr-7b-beta model with LoRA modules. Since this functionality is already covered in the dedicated test_lora_adapters.py suite, I can proceed with removing that specific test case from test_chat.py + ofc replacing HuggingFaceH4/zephyr-7b-beta with a tiny model from #23456 (comment).

Does this align with what you had in mind?

@robertgshaw2-redhat
Copy link
Collaborator

Thanks @AzizCode92! Are you in vllm slack? We have a channel #ci-sprint where we've just started to coordinate these efforts.

Hi @njhill, no I just sent a request to join it.

@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667

Thanks for clarifying. Just to confirm my understanding, you're suggesting we consolidate LoRA-specific tests into tests/entrypoints/openai/test_lora_adapters.py and remove them from other test files to avoid redundancy.

For instance, tests/entrypoints/openai/test_chat.py currently tests the HuggingFaceH4/zephyr-7b-beta model with LoRA modules. Since this functionality is already covered in the dedicated test_lora_adapters.py suite, I can proceed with removing that specific test case from test_chat.py + ofc replacing HuggingFaceH4/zephyr-7b-beta with a tiny model from #23456 (comment).

Does this align with what you had in mind?

yep --- we need to be careful that we aren't removing some coverage, but ideally there should be a single file for LoRA. Then we can have a single shared common fixture for the openai tests that that uses a tiny model. Removing LoRA should help a lot with the sharing across the other test groups

@jeejeelee
Copy link
Collaborator

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures.
This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable.

IIUC, LoRA weight should be the same as the model - if it's already downloaded, it won't download again repeatedly, so this PR won't work. See : https://buildkite.com/vllm/ci/builds/28468#0198e9bd-43ae-42d6-8710-12986ab9b173/210

@DarkLight1337 If I am wrong, please correct me

@DarkLight1337
Copy link
Member

DarkLight1337 commented Aug 27, 2025

I think this PR can still help by reducing the number of HTTP requests to check the updated status of each file, even though weights won't be re-downloaded

@AzizCode92
Copy link
Contributor Author

AzizCode92 commented Aug 27, 2025

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures.
This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable.

IIUC, LoRA weight should be the same as the model - if it's already downloaded, it won't download again repeatedly, so this PR won't work. See : https://buildkite.com/vllm/ci/builds/28468#0198e9bd-43ae-42d6-8710-12986ab9b173/210

@DarkLight1337 If I am wrong, please correct me

Thanks for the feedback.
As @DarkLight1337 pointed, this PR will minimize the HTTP calls for our tests. Although its a minor improvement but I think still necessary to improve the speed of the CI pipeline.
snapshot_download() will make these HTTP calls even when the files are already cached. So I thought we get rid of this call from the modules level and make a session fixture for it to make it happen once for all tests.

@njhill
Copy link
Member

njhill commented Aug 27, 2025

I think this PR can still help by reducing the number of HTTP requests to check the updated status of each file, even though weights won't be re-downloaded

We are hoping to eliminate all of the http calls globally anyhow #23451

@AzizCode92
Copy link
Contributor Author

Once the tiny lora model is ready, it will replace the zephyr model and its lora modules.
#23456 (comment)


@pytest.fixture(scope="session")
def pa_files():
"""Download PA files once per test session."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still use the prompt adapter files? This feature was removed a while ago.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutelty! I will remove it

@jeejeelee
Copy link
Collaborator

jeejeelee commented Aug 28, 2025

Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures.
This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable.

IIUC, LoRA weight should be the same as the model - if it's already downloaded, it won't download again repeatedly, so this PR won't work. See : https://buildkite.com/vllm/ci/builds/28468#0198e9bd-43ae-42d6-8710-12986ab9b173/210
@DarkLight1337 If I am wrong, please correct me

Thanks for the feedback. As @DarkLight1337 pointed, this PR will minimize the HTTP calls for our tests. Although its a minor improvement but I think still necessary to improve the speed of the CI pipeline. snapshot_download() will make these HTTP calls even when the files are already cached. So I thought we get rid of this call from the modules level and make a session fixture for it to make it happen once for all tests.

That makes sense, but what if you set local_files_only=True?
If it's to reduce HTTP calls, please modify your PR title.

@AzizCode92 AzizCode92 changed the title [test]: download model and their loras only once per test session [CI]: reduce HTTP calls inside entrypoints openai tests Aug 28, 2025
@jeejeelee
Copy link
Collaborator

I have removed tests/entrypoints/llm/test_generate_multiple_loras.py, please resolve the branch conflicit ,then overall LGTM

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM

@DarkLight1337 @hmellor please take another look, thank you

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also remove the instances of LORA_NAME in all the files where zephyr_lora_files has been removed? It doesn't appear to be used anywhere but could confuse future maintainers.

AzizCode92 and others added 2 commits September 1, 2025 20:48
Fix: Re-enable remote model download for tests

Co-authored-by: Harry Mellor <[email protected]>
Signed-off-by: Aziz <[email protected]>
@hmellor hmellor enabled auto-merge (squash) September 2, 2025 09:01
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025
@hmellor hmellor merged commit ce30dca into vllm-project:main Sep 2, 2025
25 checks passed
@AzizCode92 AzizCode92 deleted the optimize-ci-lora-downloads branch September 2, 2025 13:55
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants