Skip to content

Commit cee66a9

Browse files
committed
update doc to cann 8.3
Signed-off-by: Pz1116 <[email protected]>
1 parent 20b8fa9 commit cee66a9

File tree

1 file changed

+39
-31
lines changed

1 file changed

+39
-31
lines changed

docs/source/user_guide/feature_guide/kv_pool_mooncake.md

Lines changed: 39 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,27 @@
44

55
* Software:
66
* Python >= 3.9, < 3.12
7-
* CANN >= 8.2.rc1
8-
* PyTorch == 2.7.1, torch-npu == 2.7.1
7+
* CANN >= 8.3.rc1
8+
* PyTorch >= 2.7.1, torch-npu >= 2.7.1.dev20250724
99
* vLLM:main branch
1010
* vLLM-Ascend:main branch
11-
* Mooncake:[AscendTransport/Mooncake at pooling-async-memcpy](https://github.com/AscendTransport/Mooncake/tree/pooling-async-memcpy)(Currently available branch code, continuously updated.)
12-
Installation and Compilation Guide:https://github.com/AscendTransport/Mooncake/tree/pooling-async-memcpy?tab=readme-ov-file#build-and-use-binaries
11+
* Mooncake:main branch
12+
13+
Installation and Compilation Guide:https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries
14+
15+
Make sure to build with `-DUSE_ASCEND_DIRECT` to enable ADXL engine.
16+
17+
An example command for compiling ADXL:
18+
19+
`rm -rf build && mkdir -p build && cd build \ && cmake .. -DCMAKE_INSTALL_PREFIX=/opt/transfer-engine/ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DUSE_ASCEND_DIRECT=ON -DBUILD_SHARED_LIBS=ON -DBUILD_UNIT_TESTS=OFF \ && make -j \ && make install`
20+
21+
Also, you need to set environment variables to point to them `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64/python3.11/site-packages/mooncake`, or copy the .so files to the `/usr/local/lib64` directory after compilation
1322

1423
### KV Pooling Parameter Description
15-
**kv_connector_extra_config**:Additional Configurable Parameters for Pooling
16-
**mooncake_rpc_port**:Port for RPC Communication Between Pooling Scheduler Process and Worker Process: Each Instance Requires a Unique Port Configuration.
17-
**load_async**:Whether to Enable Asynchronous Loading. The default value is false.
18-
**register_buffer**:Whether to Register Video Memory with the Backend. Registration is Not Required When Used with MooncakeConnectorV1; It is Required in All Other Cases. The Default Value is false.
24+
**kv_connector_extra_config**:Additional Configurable Parameters for Pooling.
25+
**mooncake_rpc_port**:Port for RPC Communication Between Pooling Scheduler Process and Worker Process: Each Instance Requires a Unique Port Configuration.
26+
**load_async**:Whether to Enable Asynchronous Loading. The default value is false.
27+
**register_buffer**:Whether to Register Video Memory with the Backend. Registration is Not Required When Used with MooncakeConnectorV1; It is Required in All Other Cases. The Default Value is false.
1928

2029
## run mooncake master
2130

@@ -29,26 +38,32 @@ The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path
2938
"metadata_server": "P2PHANDSHAKE",
3039
"protocol": "ascend",
3140
"device_name": "",
41+
"use_ascend_direct": true,
42+
"alloc_in_same_node": true,
3243
"master_server_address": "xx.xx.xx.xx:50088",
3344
"global_segment_size": 30000000000
3445
}
3546
```
3647

37-
**local_hostname**: Configured as the IP address of the current master node,
38-
**metadata_server**: Configured as **P2PHANDSHAKE**,
39-
**protocol:** Configured for Ascend to use Mooncake's HCCL communication,
40-
**device_name**: ""
41-
**master_server_address**: Configured with the IP and port of the master service
42-
**global_segment_size**: Expands the kvcache size registered by the PD node to the master
48+
**local_hostname**: Configured as the IP address of the current master node.
49+
**metadata_server**: Configured as **P2PHANDSHAKE**.
50+
**protocol:** Configured for Ascend to use Mooncake's HCCL communication.
51+
**device_name**: ""
52+
**use_ascend_direct**: Indicator for using ADXL engine.
53+
**alloc_in_same_node**: Indicator for preferring local buffer allocation strategy.
54+
**master_server_address**: Configured with the IP and port of the master service.
55+
**global_segment_size**: Expands the kvcache size registered by the PD node to the master.
4356

4457
### 2. Start mooncake_master
4558

4659
Under the mooncake folder:
4760

4861
```
49-
mooncake_master --port 50088
62+
mooncake_master --port 50088 --eviction_high_watermark_ratio 0.95 --eviction_ratio 0.05
5063
```
5164

65+
`eviction_high_watermark_ratio` determines the watermark where Mooncake Store will perform eviction,and `eviction_ratio` determines the portion of stored objects that would be evicted.
66+
5267
## Pooling and Prefill Decode Disaggregate Scenario
5368

5469
### 1.Run `prefill` Node and `decode` Node
@@ -69,11 +84,9 @@ export PYTHONPATH=$PYTHONPATH:/xxxxx/vllm
6984
export MOONCAKE_CONFIG_PATH="/xxxxxx/mooncake.json"
7085
export VLLM_USE_V1=1
7186
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
72-
export ASCEND_TRANSPORT_PRINT=1
7387
export ACL_OP_INIT_MODE=1
74-
# The upper boundary environment variable for memory swap logging is set to mooncake, where 1 indicates enabled and 0 indicates disabled.
75-
export ASCEND_AGGREGATE_ENABLE=1
76-
# The upper-level environment variable is the switch for enabling the mooncake aggregation function, where 1 means on and 0 means off.
88+
export ASCEND_BUFFER_POOL=4:8
89+
# ASCEND_BUFFER_POOL is the environment variable for configuring the number and size of buffer on NPU Device for aggregation and KV transfer,the value 4:8 means we allocate 4 buffers of size 8MB.
7790
7891
python3 -m vllm.entrypoints.openai.api_server \
7992
--model /xxxxx/Qwen2.5-7B-Instruct \
@@ -108,11 +121,11 @@ python3 -m vllm.entrypoints.openai.api_server \
108121
}
109122
}
110123
},
111-
{
124+
{
112125
"kv_connector": "MooncakeConnectorStoreV1",
113126
"kv_role": "kv_producer",
114127
"mooncake_rpc_port":"0"
115-
}
128+
}
116129
]
117130
}
118131
}' > p.log 2>&1
@@ -133,10 +146,7 @@ export MOONCAKE_CONFIG_PATH="/xxxxx/mooncake.json"
133146
export VLLM_USE_V1=1
134147
export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7
135148
export ACL_OP_INIT_MODE=1
136-
export ASCEND_TRANSPORT_PRINT=1
137-
# The upper boundary environment variable for memory swap logging is set to mooncake, where 1 indicates enabled and 0 indicates disabled.
138-
export ASCEND_AGGREGATE_ENABLE=1
139-
# The upper-level environment variable is the switch for enabling the mooncake aggregation function, where 1 means on and 0 means off.
149+
export ASCEND_BUFFER_POOL=4:8
140150
141151
python3 -m vllm.entrypoints.openai.api_server \
142152
--model /xxxxx/Qwen2.5-7B-Instruct \
@@ -156,11 +166,12 @@ python3 -m vllm.entrypoints.openai.api_server \
156166
"kv_connector_extra_config": {
157167
"use_layerwise": false,
158168
"connectors": [
159-
{
169+
{
160170
"kv_connector": "MooncakeConnectorV1",
161171
"kv_role": "kv_consumer",
162172
"kv_port": "20002",
163173
"kv_connector_extra_config": {
174+
"use_ascend_direct": true,
164175
"prefill": {
165176
"dp_size": 1,
166177
"tp_size": 1
@@ -170,7 +181,7 @@ python3 -m vllm.entrypoints.openai.api_server \
170181
"tp_size": 1
171182
}
172183
}
173-
},
184+
},
174185
{
175186
"kv_connector": "MooncakeConnectorStoreV1",
176187
"kv_role": "kv_consumer",
@@ -234,10 +245,7 @@ export MOONCAKE_CONFIG_PATH="/xxxxxx/mooncake.json"
234245
export VLLM_USE_V1=1
235246
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
236247
export ACL_OP_INIT_MODE=1
237-
export ASCEND_TRANSPORT_PRINT=1
238-
# The upper boundary environment variable for memory swap logging is set to mooncake, where 1 indicates enabled and 0 indicates disabled.
239-
export ASCEND_AGGREGATE_ENABLE=1
240-
# The upper-level environment variable is the switch for enabling the mooncake aggregation function, where 1 means on and 0 means off.
248+
export ASCEND_BUFFER_POOL=4:8
241249
242250
python3 -m vllm.entrypoints.openai.api_server \
243251
--model /xxxxx/Qwen2.5-7B-Instruct \

0 commit comments

Comments
 (0)