[Feat] Eagle adapt vllm's speculative_config.enforce_eager #4698

anon189Ty · 2025-12-04T06:21:07Z

What this PR does / why we need it?

There is a new command "enforce_eager" in vllm's SpeculativeConfig. When it open, spec model should only run in eager mode and no impact the main model. This PR will let the eagle in vllm-ascend can use this command.
You can see this command in vllm/vllm/config/speculative.py

Does this PR introduce any user-facing change?

If you want to make eagle model always run in eager mode, you should add the '"enforce_eager": true' command when you set the speculative_config in your start sever command.
For example:
"""
vllm server /weights/Qwen3
--speculative-config '{"enforce_eager": true, "method": "eagle3", "model": "/weights/Eagle-Qwen3", "num_speculative_tpkens": 1}'
"""

How was this patch tested?

vLLM version: v0.12.0
vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

Signed-off-by: anon189Ty <[email protected]>

gemini-code-assist

Code Review

The pull request successfully integrates the speculative_config.enforce_eager flag into the EagleProposer, ensuring that CUDA graph usage is correctly disabled when eager mode is enforced. Key changes include updating the use_cuda_graph condition, passing the device to AttentionMaskBuilder, and initializing aclgraph_runtime_mode and batch_descriptor appropriately. However, there are a couple of areas where the code can be improved for clarity and to remove redundancy.

gemini-code-assist · 2025-12-04T06:24:22Z

vllm_ascend/spec_decode/eagle_proposer.py

        self.positions[:num_tokens] = target_positions.to(device)
        self.hidden_states[:num_tokens] = target_hidden_states
        attn_metadata.block_tables = block_table.to(device)
+        # Make sure the uspeculative_config.enforce_eager validated to be compatible with eagle full graph.


There is a typo in this comment. It says "uspeculative_config.enforce_eager" instead of "speculative_config.enforce_eager". Please correct it for better readability.

Suggested change

# Make sure the uspeculative_config.enforce_eager validated to be compatible with eagle full graph.

# Make sure the speculative_config.enforce_eager validated to be compatible with eagle full graph.

gemini-code-assist · 2025-12-04T06:24:23Z

vllm_ascend/spec_decode/eagle_proposer.py

+        if not self.use_cuda_graph:
+            aclgraph_runtime_mode = CUDAGraphMode.NONE
+            batch_descriptor = None


This block of code is redundant. aclgraph_runtime_mode is already initialized to CUDAGraphMode.NONE on line 462, and batch_descriptor is initialized to None on line 463. If self.use_cuda_graph is false, these variables would already hold their intended values. Removing this redundant check will make the code cleaner and easier to maintain.

github-actions · 2025-12-04T06:31:47Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

[Feat] Eagle adapt vllm's speculative_config.enforce_eager

69785f6

Signed-off-by: anon189Ty <[email protected]>

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Eagle adapt vllm's speculative_config.enforce_eager #4698

[Feat] Eagle adapt vllm's speculative_config.enforce_eager #4698

Uh oh!

anon189Ty commented Dec 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	# Make sure the uspeculative_config.enforce_eager validated to be compatible with eagle full graph.
	# Make sure the speculative_config.enforce_eager validated to be compatible with eagle full graph.

[Feat] Eagle adapt vllm's speculative_config.enforce_eager #4698

Are you sure you want to change the base?

[Feat] Eagle adapt vllm's speculative_config.enforce_eager #4698

Uh oh!

Conversation

anon189Ty commented Dec 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anon189Ty commented Dec 4, 2025 •

edited by github-actions bot

Loading