feat(sfa): Integrate fused mla_preprocess operator #4533

yiz-liu · 2025-11-28T08:00:27Z

What this PR does / why we need it?

Replaces several discrete operations in the SFA forward pass with a single call to the fused mla_preprocess custom operator. This operator combines Q/K/V projection, RoPE application, and KV cache updates into one kernel.

A new weight processing method is added to transform and pre-process weights into the specific layout required by the fused operator. This change aims to improve performance by reducing kernel launch overhead.

Additionally, the condition for allocating RoPE caches is relaxed to support MLA in modes other than just full decode.

Does this PR introduce any user-facing change?

None

How was this patch tested?

None

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Replaces several discrete operations in the SFA forward pass with a single call to the fused `mla_preprocess` custom operator. This operator combines Q/K/V projection, RoPE application, and KV cache updates into one kernel. A new weight processing method is added to transform and pre-process weights into the specific layout required by the fused operator. This change aims to improve performance by reducing kernel launch overhead. Additionally, the condition for allocating RoPE caches is relaxed to support MLA in modes other than just full decode. Signed-off-by: Yizhou Liu <[email protected]>

github-actions · 2025-11-28T08:00:34Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sfa): Integrate fused mla_preprocess operator #4533

feat(sfa): Integrate fused mla_preprocess operator #4533

yiz-liu commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(sfa): Integrate fused mla_preprocess operator #4533

Are you sure you want to change the base?

feat(sfa): Integrate fused mla_preprocess operator #4533

Conversation

yiz-liu commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yiz-liu commented Nov 28, 2025 •

edited by github-actions bot

Loading