nccl-collectives

Implementing more optimized collectives for NCCL.

Setup thetaGPU

ssh thetagpusn1
qsub -I -n 1 -t 10 -q full-node -A dist_relational_alg --attrs filesystems=home,grand,theta-fs0
cd <nccl-collectives-path>/
module load nccl

Build

There are 4 all-to-all implementations, each stored in their own folder within the root directory. Each implementation has a Makefile that should compile on a local machine as well as on theta. cd into the implementation that you want to run and type make. The generated binary is called .out and can be found in the same directory.

Run

The command is the same for all implementations: mpiexec -n <N> ./<implementation-name>.out.

Input and Output

For a given number of MPI processes (n), each process allocates an array of size n and fills each element with its rank. So for n = 2 the input looks like:

p1 = [0 0]
p2 = [1 1]

Then each implementation uses these arrays to perform the same operation: sending each rank to every other rank. Therefore, the expected output when n = 2 is:

p1 = [0 1]
p2 = [0 2]

For n = 4 the input would be:

p1 = [0 0 0 0]
p2 = [1 1 1 1]
p3 = [2 2 2 2]
p4 = [3 3 3 3]

and the expected output would be:

p1 = [0 1 2 3]
p2 = [0 1 2 3]
p3 = [0 1 2 3]
p4 = [0 1 2 3]

Debug

The relevant files to look at for the NCCL Bruck implementation are:

nccl-collectives/common/bruck.cu
nccl-collectives/nccl-ata-bruck/nccl-ata-bruck.cu

The files for the corresponding MPI-only Bruck implementation are:

nccl-collectives/common/bruck.cpp
nccl-collectives/mpi-ata-bruck/mpi-ata-bruck.cpp

The MPI-only Bruck implementation is confirmed to give the correct output, and can be used to as a reference when debugging the NCCL Bruck implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
bruck-verify		bruck-verify
common		common
mpi-ata-bruck		mpi-ata-bruck
mpi-ata-spreadout		mpi-ata-spreadout
mpi-ata		mpi-ata
mpi-exchange		mpi-exchange
mpi-pingpong		mpi-pingpong
nccl-ata-bruck		nccl-ata-bruck
nccl-ata-spreadout		nccl-ata-spreadout
nccl-ata		nccl-ata
nccl-exchange		nccl-exchange
nccl-pingpong		nccl-pingpong
verify-all		verify-all
verify-nccl-bruck		verify-nccl-bruck
verify-nccl-builtin		verify-nccl-builtin
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nccl-collectives

Setup thetaGPU

Build

Run

Input and Output

Debug

About

Uh oh!

Releases

Packages

Languages

ComputingElevatedLab/nccl-collectives

Folders and files

Latest commit

History

Repository files navigation

nccl-collectives

Setup thetaGPU

Build

Run

Input and Output

Debug

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages