Skip to content

Commit 7f0473f

Browse files
authored
Merge branch 'main' into fix_bmm_cann
2 parents b7fc674 + a78f49e commit 7f0473f

File tree

3 files changed

+1011
-832
lines changed

3 files changed

+1011
-832
lines changed

docs/source/tutorials/single_node_pd_disaggregation_llmdatadist.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ bash gen_ranktable.sh --ips 192.0.0.1 \
4545
--npus-per-node 2 --network-card-name eth0 --prefill-device-cnt 1 --decode-device-cnt 1
4646
```
4747

48-
The rank table will be generated at /vllm-workspace/vllm-ascend/examples/disaggregate_prefill_v1/ranktable.json
48+
If you want to run "2P1D", please set npus-per-node to 3 and prefill-device-cnt to 2. The rank table will be generated at /vllm-workspace/vllm-ascend/examples/disaggregate_prefill_v1/ranktable.json
4949

5050
|Parameter | Meaning |
5151
| --- | --- |
@@ -137,6 +137,8 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
137137

138138
:::::
139139

140+
If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES, VLLM_ASCEND_LLMDD_RPC_PORT and port to different values for each P process.
141+
140142
## Example Proxy for Deployment
141143

142144
Run a proxy server on the same node with the prefiller service instance. You can get the proxy program in the repository's examples: [load\_balance\_proxy\_server\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)
@@ -151,6 +153,12 @@ python load_balance_proxy_server_example.py \
151153
--decoder-ports 13701
152154
```
153155

156+
|Parameter | Meaning |
157+
| --- | --- |
158+
| --port | Port of proxy |
159+
| --prefiller-port | All ports of prefill |
160+
| --decoder-ports | All ports of decoder |
161+
154162
## Verification
155163

156164
Check service health using the proxy server endpoint.

0 commit comments

Comments
 (0)