Skip to content

Conversation

@yiz-liu
Copy link
Collaborator

@yiz-liu yiz-liu commented Oct 27, 2025

What this PR does / why we need it?

The cache for MLA decode graph parameters was holding strong references to tensors, preventing them from being garbage collected and leading to increased memory usage.

This change wraps the cached tensors in weak references, allowing them to be deallocated when no longer in use and reducing overall memory pressure.

Does this PR introduce any user-facing change?

None.

How was this patch tested?

None.

The cache for MLA decode graph parameters was holding strong references
to tensors, preventing them from being garbage collected and leading to
increased memory usage.

This change wraps the cached tensors in weak references, allowing them
to be deallocated when no longer in use and reducing overall memory
pressure.
None.
None.

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0

---------

Signed-off-by: Yizhou Liu <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@yiz-liu yiz-liu changed the title [Fix] Prevent memory leak in MLA decode graph (#3743) [v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) Oct 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a memory leak in the MLA decode graph by introducing weak references for cached tensors. The changes are consistently applied across attention_v1.py and mla_v1.py, ensuring that temporary tensors and workspace buffers are wrapped in weak_ref_tensors before being cached. This allows for proper garbage collection and reduces memory pressure. The related graph update logic in acl_graph.py has been correctly adjusted to accommodate these changes. The correction to remove the weak reference from block_tables is also appropriate, as it is a per-request input that needs to remain accessible. Overall, the changes are well-implemented and solve the stated problem.

@yiz-liu yiz-liu added ready read for review ready-for-test start test by label for PR labels Oct 27, 2025
@yiz-liu yiz-liu merged commit 43276fd into vllm-project:v0.11.0-dev Oct 27, 2025
37 of 38 checks passed
@yiz-liu yiz-liu deleted the v0.11.0-reduce-mem branch October 27, 2025 08:00
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 28, 2025
…3743) (vllm-project#3774)

### What this PR does / why we need it?
The cache for MLA decode graph parameters was holding strong references
to tensors, preventing them from being garbage collected and leading to
increased memory usage.

This change wraps the cached tensors in weak references, allowing them
to be deallocated when no longer in use and reducing overall memory
pressure.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

Signed-off-by: Yizhou Liu <[email protected]>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
…3743) (vllm-project#3774)

### What this PR does / why we need it?
The cache for MLA decode graph parameters was holding strong references
to tensors, preventing them from being garbage collected and leading to
increased memory usage.

This change wraps the cached tensors in weak references, allowing them
to be deallocated when no longer in use and reducing overall memory
pressure.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

Signed-off-by: Yizhou Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant