Skip to content

Conversation

@chris668899
Copy link
Contributor

@chris668899 chris668899 commented Apr 28, 2025

What this PR does / why we need it?

This PR add new function of : npugraph_batch_size can dynamic adjust to different model; before this PR, the npugraph_batch_sizes given from vllm to vllm-ascend always too large, and that may result in ERROR while running on different, with the information: "The resources are insufficient".
Now, with this PR, the code can dynamic adjust npugraph_batch_sizes depend on the model hidden_layer_nums and parallel config, for example:
a. for Qwen2.5-7B, the npugraph_batch_size length is 33 total;
b. for Qwen2.5-72B, the npugraph_batch_size length is 11 total;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant