Support mtp run in full graph mode #3903

anon189Ty · 2025-10-30T11:05:04Z

What this PR does / why we need it?

Currently, the MTP model still runs in eager in full graph mode. This PR adapts the MTP with the full graph capture and execution. When the graph mode is set to "FULL_DECODE_ONLY", the MTP will run in full-graph to improve the performance.

The change include:

Add _mtp_graph_params in acl_graph.py to isolate the data of main model and the data of mtp.
Padding some metadata in mla_v1.py when in fullgraph mode.
Fixed the essential data address that will be used in model.forward.
Adapted according to the aclgraph capture framwork:
1). Rebuild mtp model with ACLGraphWrapper.
2). Add common attn metadata when start capture in mtp dummy_run.
3). Add common attn metadata update in mtp.
4). Addapted data update when num_speculative_tokens > 1.

Does this PR introduce any user-facing change?

How was this patch tested?

github-actions · 2025-10-30T11:05:14Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for running MTP models in full graph mode on Ascend hardware, which should improve performance. The changes adapt the MTP model with full graph capture and execution. The implementation introduces a new set of graph parameters and functions for MTP, which are duplicates of existing ones. My main feedback is to refactor this duplicated code to improve maintainability.

gemini-code-assist · 2025-10-30T11:08:09Z

vllm_ascend/compilation/acl_graph.py

+@dataclass
+class MTPGraphParams:
+    events: dict[int, list[torch.npu.ExternalEvent]]
+    workspaces: dict[int, torch.Tensor]
+    handles: dict[int, list[torch_npu._C._NPUTaskGroupHandle]]
+    attn_params: dict[int, list[tuple]]
+
+
+_mtp_graph_params: Optional[MTPGraphParams] = None
+
+
+def set_mtp_graph_params(aclgraph_capture_sizes: set[int]):
+    global _mtp_graph_params
+    if _mtp_graph_params is not None:
+        raise ValueError("MTPGraph parameters have already been set!")
+    _mtp_graph_params = MTPGraphParams(
+        {size: []
+         for size in aclgraph_capture_sizes},
+        {size: None
+         for size in aclgraph_capture_sizes},
+        {size: []
+         for size in aclgraph_capture_sizes},
+        {size: []
+         for size in aclgraph_capture_sizes},
+    )
+
+
+def update_mtp_graph_params_workspaces(num_tokens: int, workspace: Any):
+    global _mtp_graph_params
+    if _mtp_graph_params is not None:
+        _mtp_graph_params.workspaces[num_tokens] = workspace
+
+
+def get_mtp_graph_params():
+    return _mtp_graph_params


The MTPGraphParams class and its associated functions (set_mtp_graph_params, update_mtp_graph_params_workspaces, get_mtp_graph_params) are duplicates of GraphParams and its functions. This introduces significant code duplication, which can lead to maintenance issues and potential bugs if one version is updated and the other is forgotten.

Consider refactoring to avoid this duplication. You could, for example, use a single set of functions and a dictionary to manage parameters for different graph types (e.g., DEFAULT and MTP). This would involve:

Removing MTPGraphParams and its related functions. GraphParams can be used for both.

Using a dictionary to store GraphParams instances for different graph types, keyed by an enum.

Modifying the set/update/get functions to accept a graph_type parameter to operate on the correct GraphParams instance.

github-actions · 2025-11-03T06:27:46Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

whx-sjtu · 2025-11-07T09:00:22Z

vllm_ascend/spec_decode/mtp_proposer.py

        self.use_sparse = hasattr(vllm_config.model_config.hf_config,
                                  "index_topk")

+        self.query_start_loc = torch.zeros(


Maybe these pinned tensors can also reuse corresponding tensors in model runner by reference to self.runner?

JC-ut0 · 2025-11-07T09:38:50Z

Plead add UT tests for FULL_DECODE_ONLY aclgraph

wangxiyuan · 2025-11-10T03:57:41Z

has this change be merged into main? which commit

github-actions · 2025-11-10T09:20:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-11-19T13:32:59Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: anon189Ty <[email protected]>

github-actions · 2025-12-01T03:32:39Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

yiz-liu · 2025-12-08T06:48:08Z

This PR isn’t needed anymore since it’s already implemented on the main branch, and no new features should be added to the v0.11.0-dev release.

github-actions bot added the module:core label Oct 30, 2025

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Nov 3, 2025

anon189Ty force-pushed the mtp_dev branch from 6830ab3 to d12251f Compare November 3, 2025 07:51

github-actions bot removed the merge-conflicts label Nov 3, 2025

anon189Ty force-pushed the mtp_dev branch 7 times, most recently from 65d1bcc to d0210e7 Compare November 6, 2025 09:42

whx-sjtu reviewed Nov 7, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Nov 10, 2025

anon189Ty force-pushed the mtp_dev branch from d0210e7 to 0894767 Compare November 12, 2025 07:14

github-actions bot removed the merge-conflicts label Nov 12, 2025

anon189Ty force-pushed the mtp_dev branch 6 times, most recently from 5817e91 to 9258fa3 Compare November 17, 2025 04:01

github-actions bot added the module:tests label Nov 17, 2025

anon189Ty force-pushed the mtp_dev branch 2 times, most recently from cda2170 to 3e7deb1 Compare November 18, 2025 06:07

anon189Ty force-pushed the mtp_dev branch from 3e7deb1 to 036bf4a Compare November 18, 2025 06:35

github-actions bot added the merge-conflicts label Nov 19, 2025

anon189Ty added 4 commits November 25, 2025 19:16

Support mtp run in full graph mode

b45d17b

Signed-off-by: anon189Ty <[email protected]>

Update the way of capture

25e3de0

Signed-off-by: anon189Ty <[email protected]>

Optimize memory reuse and reduce data copying

bb4fbd0

Signed-off-by: anon189Ty <[email protected]>

Add ut aboult mtp fullgraph

f931dbb

Signed-off-by: anon189Ty <[email protected]>

anon189Ty force-pushed the mtp_dev branch from 036bf4a to f931dbb Compare November 25, 2025 11:21

github-actions bot added merge-conflicts and removed merge-conflicts labels Nov 25, 2025

yiz-liu closed this Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support mtp run in full graph mode #3903

Support mtp run in full graph mode #3903

anon189Ty commented Oct 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

whx-sjtu Nov 7, 2025

Uh oh!

JC-ut0 commented Nov 7, 2025

Uh oh!

wangxiyuan commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

yiz-liu commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Support mtp run in full graph mode #3903

Support mtp run in full graph mode #3903

Conversation

anon189Ty commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

whx-sjtu Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

JC-ut0 commented Nov 7, 2025

Uh oh!

wangxiyuan commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

yiz-liu commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

anon189Ty commented Oct 30, 2025 •

edited

Loading