[Feature] Support multi graphs for torchair #4757

jianzs · 2025-12-05T14:29:53Z

What this PR does / why we need it?

This PR supports configuring multiple Torchair graphs when MTP is enabled.

Does this PR introduce any user-facing change?

When MTP is enabled, users can set up multiple Torchair graphs, though these may adjust automatically based on the environment.

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: Jade Zheng <[email protected]>

gemini-code-assist

Code Review

This pull request enables support for multiple Torchair graphs with MTP by removing a hardcoded graph size configuration and refining the graph size alignment logic. The changes look good overall, but I've identified an inconsistency in the alignment logic that should be addressed. The new alignment method introduced in _align_graph_size_divisible_by_tp_size is an improvement, but the old, less optimal logic is still used elsewhere, which could lead to issues. My review comment provides details on how to resolve this.

gemini-code-assist · 2025-12-05T14:31:23Z

vllm_ascend/torchair/torchair_model_runner.py

+            cur_graph_batch_size = (graph_batch_size + lcm_size -
+                                    1) // lcm_size * lcm_size


The alignment logic for cur_graph_batch_size has been updated here to round up to the nearest multiple of lcm(tp_size, self.decode_token_per_req). This is a good improvement as it can result in smaller, more appropriate batch sizes compared to the previous lcm(tp_size, graph_batch_size) logic, especially when graph_batch_size is large.

However, the old logic is still being used in _init_mc2_tokens_capacity via calculate_new_torchair_graph_batch_size. This inconsistency could lead to an unnecessarily large mc2_tokens_capacity, potentially causing memory issues or hitting device limits.

To ensure consistency and correctness, _init_mc2_tokens_capacity should be updated to use the same alignment logic. After this change, calculate_new_torchair_graph_batch_size would become dead code and could be removed for better code hygiene.

For example, _init_mc2_tokens_capacity could be updated as follows:

def _init_mc2_tokens_capacity(self): # NOTE: To be clear, we need to make sure that during graph capture, the number of # tokens is less than or equal to mc2_tokens_capacity. According to _set_cudagraph_sizes, # the max number of tokens in graph is min(max_num_seqs * uniform_decode_query_len, 512). max_num_tokens = self.max_num_reqs * self.uniform_decode_query_len tp_size = self.parallel_config.tensor_parallel_size # Use the new alignment logic lcm_size = math.lcm(tp_size, self.uniform_decode_query_len) max_graph_batch_size = (max_num_tokens + lcm_size - 1) // lcm_size * lcm_size self.mc2_tokens_capacity = max_graph_batch_size if get_ascend_device_type( ) == AscendDeviceType._910_93 and self.mc2_tokens_capacity > 512: logger.error( f"A3: the max number of tokens must smaller then 512, but now is {self.mc2_tokens_capacity}" ) if get_ascend_device_type( ) == AscendDeviceType._910B and self.mc2_tokens_capacity > 256: logger.error( f"A2: the max number of tokens must smaller then 256, but now is {self.mc2_tokens_capacity}" )

github-actions · 2025-12-05T14:53:02Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

[Feature] Support multi graphs for torchair

15f3d65

Signed-off-by: Jade Zheng <[email protected]>

gemini-code-assist bot reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support multi graphs for torchair #4757

[Feature] Support multi graphs for torchair #4757

jianzs commented Dec 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		cur_graph_batch_size = (graph_batch_size + lcm_size -
		1) // lcm_size * lcm_size

[Feature] Support multi graphs for torchair #4757

Are you sure you want to change the base?

[Feature] Support multi graphs for torchair #4757

Conversation

jianzs commented Dec 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jianzs commented Dec 5, 2025 •

edited by github-actions bot

Loading