Skip to content

Conversation

@MrZ20
Copy link
Contributor

@MrZ20 MrZ20 commented Dec 4, 2025

What this PR does / why we need it?

Allow setting cudagraph_capture_sizes in VllmRunner

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the cudagraph_capture_sizes parameter to VllmRunner initializations across numerous test files, setting it to [4]. This is a good step towards standardizing test configurations for CUDA graph capturing.

However, I've identified several instances where cudagraph_capture_sizes is defined both as a direct argument to VllmRunner and within the compilation_config dictionary. In these cases, the value inside compilation_config takes precedence, rendering the newly added direct argument ineffective. This could lead to confusion and incorrect test behavior. I've left specific comments with suggestions to remove the redundant definitions to ensure clarity and correctness.

with VllmRunner(model,
max_model_len=1024,
tensor_parallel_size=2,
cudagraph_capture_sizes=[4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The added cudagraph_capture_sizes=[4] argument is ineffective because compilation_config on line 46 also defines cudagraph_capture_sizes. The value from compilation_config takes precedence, which can lead to confusion. Please remove this redundant argument and rely on the one in compilation_config.

with VllmRunner(model,
max_model_len=1024,
tensor_parallel_size=2,
cudagraph_capture_sizes=[4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The added cudagraph_capture_sizes=[4] argument is ineffective because compilation_config on line 94 also defines cudagraph_capture_sizes. The value from compilation_config takes precedence, which can lead to confusion. Please remove this redundant argument and rely on the one in compilation_config.

model,
max_model_len=1024,
tensor_parallel_size=2,
cudagraph_capture_sizes=[4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The added cudagraph_capture_sizes=[4] argument is ineffective because compilation_config on line 58 also defines cudagraph_capture_sizes. The value from compilation_config takes precedence, which can lead to confusion. Please remove this redundant argument and rely on the one in compilation_config.

with VllmRunner(
model,
max_model_len=1024,
cudagraph_capture_sizes=[4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The added cudagraph_capture_sizes=[4] argument is ineffective because compilation_config on line 176 also defines cudagraph_capture_sizes. The value from compilation_config takes precedence, which can lead to confusion. Please remove this redundant argument and rely on the one in compilation_config.

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant