[Question] Unexpectedly low bandwidth when scaling from 2 nodes to 4 nodes

Hi,

I’m testing DeepEP across multiple nodes and found that bandwidth looks normal on 2 nodes, but drops sharply when scaling to 4 nodes.

Could you please suggest possible causes or configurations that might affect this? Any tips for debugging or tuning would be appreciated.

Setup:

GPUs: NVIDIA H200

Network: CX7 400 Gb/s InfiniBand

DeepEP version: a84a248(commit ID)

Test result(only list best):

```
# 2 nodes
[tuning] Best combine: SMs 24, NVL chunk 2, RDMA chunk 20: 44.37 GB/s (RDMA), 145.16 GB/s (NVL)
[tuning] Best dispatch (BF16): SMs 24, NVL chunk 28, RDMA chunk 16: 40.89 GB/s (RDMA), 133.78 GB/s (NVL)
[tuning] Best dispatch (FP8): SMs 24, NVL chunk 28, RDMA chunk 24: 38.19 GB/s (RDMA), 124.92 GB/s (NVL)

# 4 nodes
[tuning] Best combine: SMs 24, NVL chunk 2, RDMA chunk 8: 12.53 GB/s (RDMA), 25.05 GB/s (NVL)
[tuning] Best dispatch (BF16): SMs 24, NVL chunk 12, RDMA chunk 4: 12.27 GB/s (RDMA), 24.51 GB/s (NVL)
[tuning] Best dispatch (FP8): SMs 24, NVL chunk 4, RDMA chunk 4: 13.15 GB/s (RDMA), 26.27 GB/s (NVL)
```

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Unexpectedly low bandwidth when scaling from 2 nodes to 4 nodes #493

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Unexpectedly low bandwidth when scaling from 2 nodes to 4 nodes #493

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions