Skip to content

Commit 4cbb26e

Browse files
authored
Add MSCCL++ Support (#29)
1 parent 85afc90 commit 4cbb26e

File tree

2 files changed

+40
-2
lines changed

2 files changed

+40
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Introduction
22

3-
NPKit (Networking Profiling Kit) is a profiling framework designed for popular collective communication libraries (CCLs), including [Microsoft MSCCL](https://github.com/Azure/msccl/), [NVIDIA NCCL](https://github.com/NVIDIA/nccl) and [AMD RCCL](https://github.com/ROCmSoftwarePlatform/rccl/). It enables users to insert customized profiling events into different CCL components, especially into giant GPU kernels. These events are then automatically placed onto a unified timeline in [Google Trace Event Format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview), which users can then leverage trace viewer to understand CCLs' workflow and performance.
3+
NPKit (Networking Profiling Kit) is a profiling framework designed for popular collective communication libraries (CCLs), including [Microsoft MSCCL](https://github.com/Azure/msccl/), [Microsoft MSCCL++](https://github.com/microsoft/mscclpp/), [NVIDIA NCCL](https://github.com/NVIDIA/nccl) and [AMD RCCL](https://github.com/ROCmSoftwarePlatform/rccl/). It enables users to insert customized profiling events into different CCL components, especially into giant GPU kernels. These events are then automatically placed onto a unified timeline in [Google Trace Event Format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview), which users can then leverage trace viewer to understand CCLs' workflow and performance.
44

55
NPKit is easy to use. It runs with all kinds of workloads where CCLs are leveraged. Users only need to dynamically link their workload binary to CCLs built with NPKit enabled, then the unified timeline with profiling events are automatically generated.
66

@@ -12,7 +12,7 @@ Below is an example of NPKit timeline result. Green blocks are LL128 data transf
1212

1313
## Quick Start
1414

15-
Please check `msccl_samples` for MSCCL quick start, `nccl_samples` for NCCL quick start and `rccl_samples` for RCCL quick start.
15+
Please check `msccl_samples` for MSCCL quick start, `mscclpp_samples` for MSCCL++ quick start, `nccl_samples` for NCCL quick start and `rccl_samples` for RCCL quick start.
1616

1717
## Trademarks
1818

mscclpp_samples/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
## Introduction
2+
3+
This file describes for NPKit sample workflow for [MSCCL++](https://github.com/microsoft/mscclpp). The sample workflow first builds MSCCL++ with NPKit enabled, then runs MSCCL++ executor test to collect NPKit event dump files, and finally generates NPKit trace file.
4+
5+
## Dependencies
6+
7+
[MSCCL++](https://github.com/microsoft/mscclpp) (with NPKit integrated).
8+
9+
## Usage
10+
11+
1) Build MSCCL++ with NPKit enabled.
12+
13+
```
14+
$ git clone https://github.com/microsoft/mscclpp && cd mscclpp
15+
$ mkdir build && cd build
16+
$ cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_LOCAL_GPU_TARGET_ONLY=ON -DNPKIT_FLAGS="-DENABLE_NPKIT -DENABLE_NPKIT_EVENT_TIME_SYNC_CPU -DENABLE_NPKIT_EVENT_TIME_SYNC_GPU -DENABLE_NPKIT_EVENT_EXECUTOR_OP_BASE_ENTRY -DENABLE_NPKIT_EVENT_EXECUTOR_OP_BASE_EXIT -DENABLE_NPKIT_EVENT_EXECUTOR_INIT_ENTRY -DENABLE_NPKIT_EVENT_EXECUTOR_INIT_EXIT" .. && make -j
17+
```
18+
19+
2) Create a directory to store NPKit dump files and trace files.
20+
21+
```
22+
$ mkdir /path/to/npkit_dump
23+
$ mkdir /path/to/npkit_trace
24+
```
25+
26+
3) Run MSCCL++ executor test with NPKIT_DUMP_DIR specifid.
27+
28+
```
29+
$ mpirun -tag-output -np 2 -x MSCCLPP_DEBUG=WARN -x MSCCLPP_DEBUG_SUBSYS=ALL -x NPKIT_DUMP_DIR=/path/to/npkit_dump -x LD_PRELOAD=/path/to/mscclpp/build/libmscclpp.so:$LD_PRELOAD /path/to/mscclpp/build/test/executor_test 1024 allreduce_pairs /path/to/mscclpp/test/execution-files/allreduce_packet.json 1024 10 1 LL8
30+
```
31+
32+
3) Run NPKit trace parsing script to generate trace file.
33+
34+
```
35+
$ python3 /path/to/mscclpp/tools/npkit/npkit_trace_generator.py --npkit_dump_dir=/path/to/npkit_dump --npkit_event_header_path=/path/to/mscclpp/include/mscclpp/npkit/npkit_event.hpp --output_dir=/path/to/npkit_trace
36+
```
37+
38+
4) The generated trace file `npkit_event_trace.json` is in [Google Trace Event Format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview) and can be viewed by trace viewers.

0 commit comments

Comments
 (0)