-
Notifications
You must be signed in to change notification settings - Fork 103
Open
Description
I notice that uccl engines are created on all GPUs available, even if the memory is allocated in a single GPU.
I think we should create engines only in the engine where the uccl engine is initialized for P2P.
UCCL_RCMODE=1 NCCL_IB_GID_INDEX=3 python benchmark_nixl.py --role server --device cpu --local-gpu-idx 0 --iters 1 --op-type read --sizes 104857600 --backend uccl_p2p
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3542 C python 526MiB |
| 1 N/A N/A 3542 C python 524MiB |
| 2 N/A N/A 3542 C python 524MiB |
| 3 N/A N/A 3542 C python 524MiB |
| 4 N/A N/A 3542 C python 524MiB |
| 5 N/A N/A 3542 C python 524MiB |
| 6 N/A N/A 3542 C python 524MiB |
| 7 N/A N/A 3542 C python 524MiB |
+-----------------------------------------------------------------------------------------+
Metadata
Metadata
Assignees
Labels
No labels