-
Notifications
You must be signed in to change notification settings - Fork 657
[bugfix][refactor] fix recompute_scheduler break with vllm 0.12.0 & support async scheduling & refactor recompute_scheduler.py #4895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces significant refactoring to the scheduler components, aiming to fix recompute issues, support async scheduling, and align with vLLM 0.12.0. A new AsyncRecomputeScheduler is added, and the core RecomputeScheduler is refactored to inherit from the base vLLM Scheduler class. However, there are critical issues related to class definitions and naming consistency that need to be addressed to ensure the code is functional.
| scheduler_cls: Union[str, Type[object]] = ( | ||
| "vllm_ascend.core.recompute_scheduler.RecomputeScheduler") | ||
|
|
||
| @classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @dataclass decorator and the class definition for RecomputeSchedulerConfig appear to be missing. This will lead to a syntax error or NameError when the initialize_from_config method is called, as it's currently defined as a standalone method without a class context. The class definition, likely inheriting from SchedulerConfig, needs to be restored.
@dataclass
class RecomputeSchedulerConfig(SchedulerConfig):
pass
@classmethod08c4581 to
b968591
Compare
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
…upport async scheduling & refactor recompute_scheduler.py Signed-off-by: linfeng-yuan <[email protected]>
b968591 to
fce5b5d
Compare
Signed-off-by: linfeng-yuan <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an AsyncRecomputeScheduler by inheriting from AsyncScheduler and RecomputeScheduler, and refactors the RecomputeScheduler to align with upstream vLLM changes. The RecomputeScheduler now inherits from vllm.v1.core.sched.scheduler.Scheduler, significantly reducing its __init__ method and moving much of its core logic to the base class. Key changes include updating RecomputeSchedulerConfig to accept VllmConfig and dynamically select between synchronous and asynchronous recompute schedulers, optimizing scheduling for async operations by avoiding extra steps, refining preemption logic to restore token budgets and related data, and adjusting speculative decoding token calculations. Additionally, the handling of encoder inputs and KV connector metadata is updated, and KV cache event collection and publishing are moved to the update_from_output method. The RecomputeSchedulerOutput dataclass now inherits from SchedulerOutput, simplifying its structure.
Signed-off-by: linfeng-yuan <[email protected]>
ad094a0 to
a78a835
Compare
Signed-off-by: linfeng-yuan <[email protected]>
Signed-off-by: linfeng-yuan <[email protected]>
adcc6cf to
90b77d3
Compare
Signed-off-by: linfeng-yuan <[email protected]>
What this PR does / why we need it?
Currently, the initialization and fundamental functions of RecomputeScheduler are broken with
vLLM v0.12.0. This PR fixes the conflicts ofRecomputeSchedulerand refactor its implementations by inheriting originalSchedulerof vLLM. Meanwhile, this PR also supports async cheduling with recompute scheduler by implementingAsyncRecomputeSchedulerwhich is simply inheritedAsncySchedulerof vLLM andRecomputeSchedulerof vLLM-Ascend with python MRO.Does this PR introduce any user-facing change?
No. The switch naming is the same as v0.11.0 :
recompute_scheduler_enableHow was this patch tested?
E2E serving with 2P1D dsv3.1 passed. The performance was the same as original vllm scheduler with
async_schedulingand preempted requests in D Nodes are successfully transfered to Proxy and further to P Node. This significantly improves the performance and robustness of PD disaggregation deployments.