[Feature]Refactor _dummy_run in model_runner #5090

zhenwenqi2024 · 2025-12-16T09:38:25Z

What this PR does / why we need it?

now _dummy_run in npu_model_runner is different from gpu_model_runner,as some important feature like graph_dispatch and build_atten_metadata, we do this to be consistent with gpu_model_runner

Does this PR introduce any user-facing change?

NA

How was this patch tested?
vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: zhenwenqi2024 <[email protected]>

gemini-code-assist

Code Review

This pull request refactors the _dummy_run method in NPUModelRunner to align it more closely with the GPUModelRunner implementation. This is a significant and positive change for maintainability. However, I've identified a critical bug and a high-severity issue that need to be addressed.

A premature return statement within a loop in the new _build_attention_metadata method will cause it to exit early, leading to incomplete metadata and likely runtime errors.
In the refactored profile_run method, the in_profile_run flag is not set for a _dummy_run call, which can lead to incorrect behavior during profiling.

I've provided specific comments and suggestions to fix these issues.

gemini-code-assist · 2025-12-16T09:41:01Z

vllm_ascend/worker/model_runner_v1.py

+                spec_decode_common_attn_metadata = (
+                    spec_decode_common_attn_metadata.unpadded(
+                        num_tokens, num_reqs))
+            return attn_metadata, spec_decode_common_attn_metadata


The return statement is inside the for loop that iterates over kv_cache_groups. This will cause the function to exit after processing only the first KV cache group, potentially leaving attn_metadata incomplete if there are multiple groups. The return statement should be moved outside the loop to ensure all KV cache groups are processed.

gemini-code-assist · 2025-12-16T09:41:01Z

vllm_ascend/worker/model_runner_v1.py

+        mc2_tokens_capacity = get_mc2_tokens_capacity()
+        if self.max_num_tokens > mc2_tokens_capacity and \
+            select_moe_comm_method(mc2_tokens_capacity, self.vllm_config) == MoECommType.MC2:
+            self._dummy_run(mc2_tokens_capacity, is_profile=True)


The in_profile_run attribute is not set for the initial _dummy_run call within profile_run. This can lead to incorrect behavior during profiling, as some logic (e.g., _skip_all_reduce_acorss_dp_group) depends on this flag to distinguish profiling runs from regular execution. The in_profile_run flag should be set to True before this _dummy_run call and reset to False after, similar to how it's handled in the base GPUModelRunner.profile_run and the old implementation.

Suggested change

self._dummy_run(mc2_tokens_capacity, is_profile=True)

self.in_profile_run = True

self._dummy_run(mc2_tokens_capacity, is_profile=True)

self.in_profile_run = False

into dummy_run

Signed-off-by: zhenwenqi2024 <[email protected]>

github-actions · 2025-12-16T10:51:40Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: zhenwenqi2024 <[email protected]>

into dummy_run

Signed-off-by: zhenwenqi2024 <[email protected]>

into dummy_run

Signed-off-by: zhenwenqi2024 <[email protected]>

github-actions · 2025-12-17T15:56:50Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zhenwenqi2024 and others added 17 commits December 12, 2025 19:02

[bugfix] asyncscheduler bug fix

3f64d8e

Signed-off-by: zhenwenqi2024 <[email protected]>

Merge branch 'main' into main

02ddb32

Merge branch 'main' into main

029e78f

Merge branch 'main' into main

588628d

[Bugfix] asyncscheduler bug fix

b634097

Signed-off-by: zhenwenqi2024 <[email protected]>

[Bugfix] asyncscheduler bug fix

0edb42c

Signed-off-by: zhenwenqi2024 <[email protected]>

[Bugfix] asyncscheduler bug fix

52d66d3

Signed-off-by: zhenwenqi2024 <[email protected]>

[Bugfix] asyncscheduler bug fix

e01e0a4

Signed-off-by: zhenwenqi2024 <[email protected]>

[Bugfix] asyncscheduler bug fix

1bec231

Signed-off-by: zhenwenqi2024 <[email protected]>

[Bugfix] asyncscheduler bug fix

1df5994

Signed-off-by: zhenwenqi2024 <[email protected]>

[Bugfix] asyncscheduler bug fix

690b765

Signed-off-by: zhenwenqi2024 <[email protected]>

[Bugfix] asyncscheduler bug fix

6f28654

Signed-off-by: zhenwenqi2024 <[email protected]>

Merge branch 'vllm-project:main' into main

3545e9c

Merge branch 'vllm-project:main' into main

7cc7db7

Merge branch 'vllm-project:main' into main

e94d49d

[Feature] refactor _dummy_run in model_runner

da885fb

Signed-off-by: zhenwenqi2024 <[email protected]>

[Feature] refactor _dummy_run in model_runner

acac7ec

Signed-off-by: zhenwenqi2024 <[email protected]>

gemini-code-assist bot reviewed Dec 16, 2025

View reviewed changes

zhenwenqi2024 and others added 4 commits December 16, 2025 17:42

Merge branch 'main' into dummy_run

7dac176

Merge remote-tracking branch 'upstream/main' into dummy_run

25b6f72

Merge branch 'dummy_run' of https://github.com/zhenwenqi2024/vllm-ascend

b143569

into dummy_run

[Feature] refactor _dummy_run in model_runner

fa5d84b

Signed-off-by: zhenwenqi2024 <[email protected]>

zhenwenqi2024 and others added 7 commits December 16, 2025 18:56

[Feature] refactor _dummy_run in model_runner

baaa1c8

Signed-off-by: zhenwenqi2024 <[email protected]>

[Feature] refactor _dummy_run in model_runner

a8bfca4

Signed-off-by: zhenwenqi2024 <[email protected]>

[Feature] refactor _dummy_run in model_runner

6bff6e5

Signed-off-by: zhenwenqi2024 <[email protected]>

Merge branch 'main' into dummy_run

e08f40e

[Feature] refactor _dummy_run in model_runner

246531c

Signed-off-by: zhenwenqi2024 <[email protected]>

Merge branch 'dummy_run' of https://github.com/zhenwenqi2024/vllm-ascend

5c456a6

into dummy_run

Merge branch 'main' into dummy_run

6ecc957

zhenwenqi2024 added 3 commits December 17, 2025 00:00

[Feature] refactor _dummy_run in model_runner

cf3d726

Signed-off-by: zhenwenqi2024 <[email protected]>

Merge branch 'dummy_run' of https://github.com/zhenwenqi2024/vllm-ascend

81fcdf6

into dummy_run

[Feature] refactor _dummy_run in model_runner

76c9d14

Signed-off-by: zhenwenqi2024 <[email protected]>

github-actions bot added the merge-conflicts label Dec 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]Refactor _dummy_run in model_runner #5090

[Feature]Refactor _dummy_run in model_runner #5090

zhenwenqi2024 commented Dec 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 16, 2025

Uh oh!

gemini-code-assist bot Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Feature]Refactor _dummy_run in model_runner #5090

Are you sure you want to change the base?

[Feature]Refactor _dummy_run in model_runner #5090

Conversation

zhenwenqi2024 commented Dec 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhenwenqi2024 commented Dec 16, 2025 •

edited by github-actions bot

Loading