[P2P] Engines created in all GPUs

I notice that uccl engines are created on all GPUs available, even if the memory is allocated in a single GPU.
I think we should create engines only in the engine where the uccl engine is initialized for P2P.

```
UCCL_RCMODE=1 NCCL_IB_GID_INDEX=3 python benchmark_nixl.py --role server --device cpu --local-gpu-idx 0 --iters 1 --op-type read --sizes 104857600 --backend uccl_p2p



| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            3542      C   python                                  526MiB |
|    1   N/A  N/A            3542      C   python                                  524MiB |
|    2   N/A  N/A            3542      C   python                                  524MiB |
|    3   N/A  N/A            3542      C   python                                  524MiB |
|    4   N/A  N/A            3542      C   python                                  524MiB |
|    5   N/A  N/A            3542      C   python                                  524MiB |
|    6   N/A  N/A            3542      C   python                                  524MiB |
|    7   N/A  N/A            3542      C   python                                  524MiB |
+-----------------------------------------------------------------------------------------+
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[P2P] Engines created in all GPUs #568

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[P2P] Engines created in all GPUs #568

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions