-
Notifications
You must be signed in to change notification settings - Fork 624
[perf]Allow setting cudagraph_capture_sizes in VllmRunner #4694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: MrZ20 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the cudagraph_capture_sizes parameter to VllmRunner initializations across numerous test files, setting it to [4]. This is a good step towards standardizing test configurations for CUDA graph capturing.
However, I've identified several instances where cudagraph_capture_sizes is defined both as a direct argument to VllmRunner and within the compilation_config dictionary. In these cases, the value inside compilation_config takes precedence, rendering the newly added direct argument ineffective. This could lead to confusion and incorrect test behavior. I've left specific comments with suggestions to remove the redundant definitions to ensure clarity and correctness.
| with VllmRunner(model, | ||
| max_model_len=1024, | ||
| tensor_parallel_size=2, | ||
| cudagraph_capture_sizes=[4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with VllmRunner(model, | ||
| max_model_len=1024, | ||
| tensor_parallel_size=2, | ||
| cudagraph_capture_sizes=[4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| model, | ||
| max_model_len=1024, | ||
| tensor_parallel_size=2, | ||
| cudagraph_capture_sizes=[4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with VllmRunner( | ||
| model, | ||
| max_model_len=1024, | ||
| cudagraph_capture_sizes=[4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added cudagraph_capture_sizes=[4] argument is ineffective because compilation_config on line 176 also defines cudagraph_capture_sizes. The value from compilation_config takes precedence, which can lead to confusion. Please remove this redundant argument and rely on the one in compilation_config.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
What this PR does / why we need it?
Allow setting cudagraph_capture_sizes in VllmRunner
Does this PR introduce any user-facing change?
How was this patch tested?