-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[CI]: reduce HTTP calls inside entrypoints openai tests #23646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI]: reduce HTTP calls inside entrypoints openai tests #23646
Conversation
Previously, tests would repeatedly download assets from the Hugging Face Hub, slowing down CI and increasing the risk of network failures. This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant downloads and makes the CI pipeline faster and more reliable. Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: AzizCode92 <[email protected]>
|
Thanks @AzizCode92! Are you in vllm slack? We have a channel |
|
@AzizCode92 we also want to assess whether we really need to use LoRA in all of the tests. It would be good to swap out the non-LoRA model for a tiny one if possible, see #23667 |
Makes sense |
Hi @njhill, no I just sent a request to join it.
Thanks for clarifying. Just to confirm my understanding, you're suggesting we consolidate LoRA-specific tests into tests/entrypoints/openai/test_lora_adapters.py and remove them from other test files to avoid redundancy. For instance, Does this align with what you had in mind? |
yep --- we need to be careful that we aren't removing some coverage, but ideally there should be a single file for LoRA. Then we can have a single shared common fixture for the openai tests that that uses a tiny model. Removing LoRA should help a lot with the sharing across the other test groups |
IIUC, LoRA weight should be the same as the model - if it's already downloaded, it won't download again repeatedly, so this PR won't work. See : https://buildkite.com/vllm/ci/builds/28468#0198e9bd-43ae-42d6-8710-12986ab9b173/210 @DarkLight1337 If I am wrong, please correct me |
|
I think this PR can still help by reducing the number of HTTP requests to check the updated status of each file, even though weights won't be re-downloaded |
Thanks for the feedback. |
We are hoping to eliminate all of the http calls globally anyhow #23451 |
|
Once the tiny lora model is ready, it will replace the zephyr model and its lora modules. |
tests/entrypoints/conftest.py
Outdated
|
|
||
| @pytest.fixture(scope="session") | ||
| def pa_files(): | ||
| """Download PA files once per test session.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still use the prompt adapter files? This feature was removed a while ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutelty! I will remove it
That makes sense, but what if you set local_files_only=True? |
Signed-off-by: AzizCode92 <[email protected]>
|
I have removed |
Signed-off-by: AzizCode92 <[email protected]>
jeejeelee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM
@DarkLight1337 @hmellor please take another look, thank you
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also remove the instances of LORA_NAME in all the files where zephyr_lora_files has been removed? It doesn't appear to be used anywhere but could confuse future maintainers.
Signed-off-by: AzizCode92 <[email protected]>
Fix: Re-enable remote model download for tests Co-authored-by: Harry Mellor <[email protected]> Signed-off-by: Aziz <[email protected]>
…#23646) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Aziz <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
…#23646) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Aziz <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
Purpose
Previously, tests would repeatedly make HTTP calls to HF_HUB at different test modules even when the lora model and modules are downloaded.
As this would slow down our CI pipeline and increase the risk of network failures, I propose this PR.
This refactoring introduces a session-scoped pytest fixture that downloads and caches the required models and LoRAs a single time. All tests now rely on this fixture, which eliminates redundant HTTP calls and makes the CI pipeline faster and more reliable.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.