[v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) #3774

yiz-liu · 2025-10-27T01:42:31Z

What this PR does / why we need it?

The cache for MLA decode graph parameters was holding strong references to tensors, preventing them from being garbage collected and leading to increased memory usage.

This change wraps the cached tensors in weak references, allowing them to be deallocated when no longer in use and reducing overall memory pressure.

Does this PR introduce any user-facing change?

None.

How was this patch tested?

None.

The cache for MLA decode graph parameters was holding strong references to tensors, preventing them from being garbage collected and leading to increased memory usage. This change wraps the cached tensors in weak references, allowing them to be deallocated when no longer in use and reducing overall memory pressure. None. None. - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@c9461e0 --------- Signed-off-by: Yizhou Liu <[email protected]>

github-actions · 2025-10-27T01:42:43Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request effectively addresses a memory leak in the MLA decode graph by introducing weak references for cached tensors. The changes are consistently applied across attention_v1.py and mla_v1.py, ensuring that temporary tensors and workspace buffers are wrapped in weak_ref_tensors before being cached. This allows for proper garbage collection and reduces memory pressure. The related graph update logic in acl_graph.py has been correctly adjusted to accommodate these changes. The correction to remove the weak reference from block_tables is also appropriate, as it is a per-request input that needs to remain accessible. Overall, the changes are well-implemented and solve the stated problem.

…3743) (vllm-project#3774) ### What this PR does / why we need it? The cache for MLA decode graph parameters was holding strong references to tensors, preventing them from being garbage collected and leading to increased memory usage. This change wraps the cached tensors in weak references, allowing them to be deallocated when no longer in use and reducing overall memory pressure. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. Signed-off-by: Yizhou Liu <[email protected]>

github-actions bot added the module:core label Oct 27, 2025

yiz-liu changed the title ~~[Fix] Prevent memory leak in MLA decode graph (#3743)~~ [v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) Oct 27, 2025

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

yiz-liu added ready read for review ready-for-test start test by label for PR labels Oct 27, 2025

yiz-liu merged commit 43276fd into vllm-project:v0.11.0-dev Oct 27, 2025
37 of 38 checks passed

yiz-liu deleted the v0.11.0-reduce-mem branch October 27, 2025 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) #3774

[v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) #3774

Uh oh!

yiz-liu commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) #3774

[v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) #3774

Uh oh!

Conversation

yiz-liu commented Oct 27, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant