Skip to content

Commit b1a853b

Browse files
Upgrade vllm commit hash to 1216 (#5053)
### What this PR does / why we need it? Upstream vLLM PR #30212 vllm-project/vllm#30212 refactored the attention backend selection interface, This PR adapts vllm-ascend's get_attn_backend_cls to align with the new upstream standard, ensuring compatibility and reducing maintenance overhead. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? co-author:[leo-pony][[email protected]](mailto:[email protected]) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: zxwang <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: leo-pony <[email protected]>
1 parent eb4c08f commit b1a853b

File tree

5 files changed

+17
-27
lines changed

5 files changed

+17
-27
lines changed

.github/workflows/pr_test_full.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ jobs:
7474
name: e2e-full
7575
strategy:
7676
matrix:
77-
vllm_version: [4429d934de3c5cc327b0d7aec8e473aeba38db90, v0.12.0]
77+
vllm_version: [releases/v0.13.0, v0.12.0]
7878
needs: [changes]
7979
if: ${{ needs.changes.outputs.e2e_tracker == 'true' }}
8080
uses: ./.github/workflows/_e2e_test.yaml

.github/workflows/pr_test_light.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
lint:
4343
uses: ./.github/workflows/_pre_commit.yml
4444
with:
45-
vllm: 4429d934de3c5cc327b0d7aec8e473aeba38db90
45+
vllm: releases/v0.13.0
4646
changes:
4747
runs-on: linux-aarch64-a2-0
4848
outputs:
@@ -90,7 +90,7 @@ jobs:
9090
SOC_VERSION: ascend910b1
9191
strategy:
9292
matrix:
93-
vllm_version: [4429d934de3c5cc327b0d7aec8e473aeba38db90, v0.12.0]
93+
vllm_version: [releases/v0.13.0, v0.12.0]
9494

9595
steps:
9696
- name: Free up disk space
@@ -154,7 +154,7 @@ jobs:
154154
name: e2e-light
155155
strategy:
156156
matrix:
157-
vllm_version: [4429d934de3c5cc327b0d7aec8e473aeba38db90, v0.12.0]
157+
vllm_version: [releases/v0.13.0, v0.12.0]
158158
# Note (yikun): If CI resource are limited we can split job into two chain jobs
159159
needs: [lint, changes]
160160
# only trigger e2e test after lint passed and the change is e2e related with pull request.

docs/source/community/versioning_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ If you're using v0.7.3, don't forget to install [mindie-turbo](https://pypi.org/
5050
For main branch of vLLM Ascend, we usually make it compatible with the latest vLLM release and a newer commit hash of vLLM. Please note that this table is usually updated. Please check it regularly.
5151
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
5252
|-------------|--------------|------------------|-------------|--------------------|
53-
| main | 4429d934de3c5cc327b0d7aec8e473aeba38db90, v0.12.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |
53+
| main | releases/v0.13.0, v0.12.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |
5454

5555
## Release cadence
5656

tests/e2e/multicard/test_offline_inference_distributed.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -113,11 +113,9 @@ def test_sp_for_qwen3_moe() -> None:
113113
dtype="auto",
114114
tensor_parallel_size=2,
115115
distributed_executor_backend="mp",
116-
compilation_config={
117-
"pass_config": {
118-
"enable_sequence_parallelism": True
119-
}
120-
},
116+
compilation_config={"pass_config": {
117+
"enable_sp": True
118+
}},
121119
enable_expert_parallel=True,
122120
enforce_eager=True) as vllm_model:
123121
vllm_model.generate(example_prompts, sampling_params)

vllm_ascend/platform.py

Lines changed: 9 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -355,23 +355,15 @@ def import_kernels(cls) -> None:
355355
CUSTOM_OP_REGISTERED = True
356356

357357
@classmethod
358-
def get_attn_backend_cls(
359-
cls,
360-
selected_backend,
361-
head_size,
362-
dtype,
363-
kv_cache_dtype,
364-
block_size,
365-
use_mla,
366-
has_sink=False,
367-
use_sparse=False,
368-
# NOTE: Please pay special attention to the order of these parameters.
369-
# Although we are only using some of them so far
370-
# vllm passes them in sequence when using this interface.
371-
use_mm_prefix: bool = False,
372-
attn_type: str | None = None,
373-
):
374-
# choose attention backend based on use_mla
358+
def get_attn_backend_cls(cls, selected_backend, *args, **kwargs):
359+
if "attn_selector_config" in kwargs:
360+
use_mla = kwargs["attn_selector_config"].use_mla
361+
use_sparse = kwargs["attn_selector_config"].use_sparse
362+
else:
363+
use_mla = kwargs.get("use_mla",
364+
args[4] if len(args) >= 5 else None)
365+
use_sparse = kwargs.get("use_sparse",
366+
args[6] if len(args) >= 7 else None)
375367
backend_map = {
376368
(True, False): "vllm_ascend.attention.mla_v1.AscendMLABackend",
377369
(False, False):

0 commit comments

Comments
 (0)