Skip to content

Commit 452b4a0

Browse files
committed
cherry-pick CANN 8.3
Signed-off-by: wangxiyuan <[email protected]>
1 parent 7cc6208 commit 452b4a0

22 files changed

+182
-174
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
# This file is a part of the vllm-ascend project.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.2.rc1-910b-ubuntu22.04-py3.11
18+
FROM quay.io/ascend/cann:8.3.rc1-910b-ubuntu22.04-py3.11
1919

2020
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2121
ARG COMPILE_CUSTOM_KERNELS=1

Dockerfile.310p

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
# This file is a part of the vllm-ascend project.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.2.rc1-310p-ubuntu22.04-py3.11
18+
FROM quay.io/ascend/cann:8.3.rc1-310p-ubuntu22.04-py3.11
1919

2020
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2121
ARG COMPILE_CUSTOM_KERNELS=1

Dockerfile.310p.openEuler

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
# This file is a part of the vllm-ascend project.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.2.rc1-310p-openeuler24.03-py3.11
18+
FROM quay.io/ascend/cann:8.3.rc1-310p-openeuler24.03-py3.11
1919

2020
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2121
ARG COMPILE_CUSTOM_KERNELS=1

Dockerfile.a3

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
# This file is a part of the vllm-ascend project.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.2.rc1-a3-ubuntu22.04-py3.11
18+
FROM quay.io/ascend/cann:8.3.rc1-a3-ubuntu22.04-py3.11
1919

2020
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2121
ARG COMPILE_CUSTOM_KERNELS=1

Dockerfile.a3.openEuler

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
# This file is a part of the vllm-ascend project.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.2.rc1-a3-openeuler24.03-py3.11
18+
FROM quay.io/ascend/cann:8.3.rc1-a3-openeuler24.03-py3.11
1919

2020
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2121
ARG COMPILE_CUSTOM_KERNELS=1

Dockerfile.openEuler

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
# This file is a part of the vllm-ascend project.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.2.rc1-910b-openeuler24.03-py3.11
18+
FROM quay.io/ascend/cann:8.3.rc1-910b-openeuler24.03-py3.11
1919

2020
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2121
ARG COMPILE_CUSTOM_KERNELS=1

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l
4242
- OS: Linux
4343
- Software:
4444
* Python >= 3.9, < 3.12
45-
* CANN >= 8.2.rc1 (Ascend HDK version refers to [here](https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/releasenote/releasenote_0000.html))
45+
* CANN >= 8.3.rc1 (Ascend HDK version refers to [here](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/releasenote/releasenote_0000.html))
4646
* PyTorch == 2.7.1, torch-npu == 2.7.1
4747
* vLLM (the same version as vllm-ascend)
4848

README.zh.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
4343
- 操作系统:Linux
4444
- 软件:
4545
* Python >= 3.9, < 3.12
46-
* CANN >= 8.2.rc1 (Ascend HDK 版本参考[这里](https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/releasenote/releasenote_0000.html))
46+
* CANN >= 8.3.rc1 (Ascend HDK 版本参考[这里](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/releasenote/releasenote_0000.html))
4747
* PyTorch == 2.7.1, torch-npu == 2.7.1
4848
* vLLM (与vllm-ascend版本一致)
4949

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@
7575
'pip_vllm_ascend_version': "0.11.0rc0",
7676
'pip_vllm_version': "0.11.0",
7777
# CANN image tag
78-
'cann_image_tag': "8.2.rc1-910b-ubuntu22.04-py3.11",
78+
'cann_image_tag': "8.3.rc1-910b-ubuntu22.04-py3.11",
7979
# vllm version in ci
8080
'ci_vllm_version': 'v0.11.0rc3',
8181
}
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Multi Node Test
2+
3+
Multi-Node CI is designed to test distributed scenarios of very large models, eg: disaggregated_prefill multi DP across multi nodes and so on.
4+
5+
## How is works
6+
7+
The following picture shows the basic deployment view of the multi-node CI mechanism, It shows how the github action interact with [lws](https://lws.sigs.k8s.io/docs/overview/) (a kind of kubernetes crd resource)
8+
9+
![alt text](../../assets/deployment.png)
10+
11+
From the workflow perspective, we can see how the final test script is executed, The key point is that these two [lws.yaml and run.sh](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/scripts), The former defines how our k8s cluster is pulled up, and the latter defines the entry script when the pod is started, Each node executes different logic according to the [LWS_WORKER_INDEX](https://lws.sigs.k8s.io/docs/reference/labels-annotations-and-environment-variables/) environment variable, so that multiple nodes can form a distributed cluster to perform tasks.
12+
13+
![alt text](../../assets/workflow.png)
14+
15+
## How to contribute
16+
17+
1. Upload custom weights
18+
19+
If you need customized weights, for example, you quantized a w8a8 weight for DeepSeek-V3 and you want your weight to run on CI, Uploading weights to ModelScope's [vllm-ascend](https://www.modelscope.cn/organization/vllm-ascend) organization is welcome, If you do not have permission to upload, please contact @Potabk
20+
21+
2. Add config yaml
22+
23+
As the entrypoint script [run.sh](https://github.com/vllm-project/vllm-ascend/blob/0bf3f21a987aede366ec4629ad0ffec8e32fe90d/tests/e2e/nightly/multi_node/scripts/run.sh#L106) shows, A k8s pod startup means traversing all *.yaml files in the [directory](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/config/models), reading and executing according to different configurations, so what we need to do is just add "yamls" like [DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml).
24+
25+
Suppose you have **2 nodes** running a 1P1D setup (1 Prefillers + 1 Decoder):
26+
27+
you may add a config file looks like:
28+
29+
```yaml
30+
test_name: "test DeepSeek-V3 disaggregated_prefill"
31+
# the model being tested
32+
model: "vllm-ascend/DeepSeek-V3-W8A8"
33+
# how large the cluster is
34+
num_nodes: 2
35+
npu_per_node: 16
36+
# All env vars you need should add it here
37+
env_common:
38+
VLLM_USE_MODELSCOPE: true
39+
OMP_PROC_BIND: false
40+
OMP_NUM_THREADS: 100
41+
HCCL_BUFFSIZE: 1024
42+
SERVER_PORT: 8080
43+
disaggregated_prefill:
44+
enabled: true
45+
# node index(a list) which meet all the conditions:
46+
# - prefiller
47+
# - no headless(have api server)
48+
prefiller_host_index: [0]
49+
# node index(a list) which meet all the conditions:
50+
# - decoder
51+
# - no headless(have api server)
52+
decoder_host_index: [1]
53+
54+
# Add each node's vllm serve cli command just like you runs locally
55+
deployment:
56+
-
57+
server_cmd: >
58+
vllm serve ...
59+
-
60+
server_cmd: >
61+
vllm serve ...
62+
benchmarks:
63+
perf:
64+
# fill with performance test kwargs
65+
acc:
66+
# fill with accuracy test kwargs
67+
```
68+
69+
3. Add the case to nightly workflow
70+
currently, the multi-node test workflow defined in the [vllm_ascend_test_nightly_a2/a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test_nightly_a3.yaml)
71+
72+
```yaml
73+
multi-node-tests:
74+
needs: single-node-tests
75+
if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
76+
strategy:
77+
fail-fast: false
78+
max-parallel: 1
79+
matrix:
80+
test_config:
81+
- name: multi-node-deepseek-pd
82+
config_file_path: tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml
83+
size: 2
84+
- name: multi-node-qwen3-dp
85+
config_file_path: tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A3B.yaml
86+
size: 2
87+
- name: multi-node-dpsk-4node-pd
88+
config_file_path: tests/e2e/nightly/multi_node/config/models/DeepSeek-R1-W8A8.yaml
89+
size: 4
90+
uses: ./.github/workflows/_e2e_nightly_multi_node.yaml
91+
with:
92+
soc_version: a3
93+
image: m.daocloud.io/quay.io/ascend/cann:8.3.rc1-a3-ubuntu22.04-py3.11
94+
replicas: 1
95+
size: ${{ matrix.test_config.size }}
96+
config_file_path: ${{ matrix.test_config.config_file_path }}
97+
```
98+
99+
The matrix above defines all the parameters required to add a multi-machine use case, The parameters worth paying attention to (I mean if you are adding a new use case) are size and the path to the yaml configuration file. The former defines the number of nodes required for your use case, and the latter defines the path to the configuration file you have completed in step 2.

0 commit comments

Comments
 (0)