communication

Running Communication Benchmarks

To run benchmarks, there are two options:

Run a single communication operation:

For example, run with a single large message size:

deepspeed all_reduce.py

Scan across message sizes:

deepspeed all_reduce.py --scan

Each individual communication operation's benchmarks have separate benchmarking options. For all_reduce.py, for example:

usage: ds_bench [-h] [--local_rank LOCAL_RANK] [--trials TRIALS] [--warmup WARMUP] [--maxsize MAXSIZE] [--async-op] [--bw-unit {Gbps,GBps}] [--backend {nccl}] [--dist {deepspeed,torch}] [--scan] [--dtype DTYPE] [--mem-factor MEM_FACTOR] [--debug]

optional arguments:
  -h, --help            show this help message and exit
  --local_rank LOCAL_RANK
  --trials TRIALS       Number of timed iterations
  --warmup WARMUP       Number of warmup (non-timed) iterations
  --maxsize MAXSIZE     Max message size as a power of 2
  --async-op            Enables non-blocking communication
  --bw-unit {Gbps,GBps}
  --backend {nccl}      Communication library to use
  --dist {deepspeed,torch}
                        Distributed DL framework to use
  --scan                Enables scanning all message sizes
  --dtype DTYPE         PyTorch tensor dtype
  --mem-factor MEM_FACTOR
                        Proportion of max available GPU memory to use for single-size evals
  --debug               Enables alltoall debug prints

Run all available communication benchmarks:

deepspeed run_all.py

Like the individual benchmarks, run_all.py supports scanning arguments for the max message size, bw-unit, etc. Simply pass the desired arguments to run_all.py and they'll be propagated to each comm op.

Note that ds_bench is a pre-packaged wrapper around run_all.py. Users can pass the same arguments as well:

/bin/ds_bench --scan --trials=10

Adding Communication Benchmarks

To add new communication benchmarks, follow this general procedure:

Copy a similar benchmark file (e.g. to add reduce_scatter, copy all_reduce.py as a template)
Add a new bw formula in utils.get_bw
Add a new maximum tensor element formula in utils.max_numel
Replace comm op calls in new file with find-replace
Find a good default mem_factor for use in run_<collective>_single() function
Add new comm op to run_all.py

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
all_gather.py		all_gather.py
all_reduce.py		all_reduce.py
all_to_all.py		all_to_all.py
constants.py		constants.py
pt2pt.py		pt2pt.py
run_all.py		run_all.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Running Communication Benchmarks

Adding Communication Benchmarks

FilesExpand file tree

communication

Directory actions

More options

Directory actions

More options

Latest commit

History

communication

Folders and files

parent directory

README.md

Running Communication Benchmarks

Adding Communication Benchmarks