Skip to content

Commit 4eaa443

Browse files
committed
[ConvNets/TF] Adding support for Ampere
1 parent 5ca7062 commit 4eaa443

102 files changed

Lines changed: 1991 additions & 778 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

PyTorch/SpeechSynthesis/FastPitch/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -713,7 +713,7 @@ The used WaveGlow model is a 256-channel model [published on NGC](https://ngc.nv
713713

714714
Our results were obtained by running the `./scripts/inference_benchmark.sh` script in
715715
the PyTorch 20.03-py3 NGC container. Note that to reproduce the results,
716-
you need to provide pre-trained checkpoints for FastPitch and WaveGlow. Edit the script to provide your checkpoint filenames.
716+
you need to provide pre-trained checkpoins for FastPitch and WaveGlow. Edit the script to provide your checkpoint filenames.
717717

718718
Note that performance numbers are related to the length of input. The numbers reported below were taken with a moderate length of 128 characters. For longer utterances even better numbers are expected, as the generator is fully parallel.
719719

TensorFlow/Classification/ConvNets/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.03-tf1-py3
1+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.06-tf1-py3
22
FROM ${FROM_IMAGE_NAME}
33

44
ADD requirements.txt .

TensorFlow/Classification/ConvNets/README.md

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ classification
88
* [Models](#models)
99
* [Validation accuracy results](#validation-accuracy-results)
1010
* [Training performance results](#training-performance-results)
11-
* [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-(8x-v100-16G))
11+
* [Training performance: NVIDIA DGX A100 (8x A100 40G)](#training-performance-nvidia-dgx-a100-8x-a100-40g)
12+
* [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-8x-v100-16g)
1213
* [Release notes](#release-notes)
1314
* [Changelog](#changelog)
1415

@@ -25,7 +26,7 @@ The following table provides links to where you can find additional information
2526

2627
## Validation accuracy results
2728

28-
Our results were obtained by running the applicable training scripts in the tensorflow-20.03-tf1-py3 NGC container
29+
Our results were obtained by running the applicable training scripts in the tensorflow-20.06-tf1-py3 NGC container
2930
on NVIDIA DGX-1 with (8x V100 16G) GPUs. The specific training script that was run is documented in the corresponding model's README.
3031

3132
The following table shows the validation accuracy results of the
@@ -40,10 +41,30 @@ three classification models side-by-side.
4041

4142
## Training performance results
4243

44+
### Training performance: NVIDIA DGX A100 (8x A100 40G)
45+
46+
Our results were obtained by running the applicable
47+
training scripts in the tensorflow-20.06-tf1-py3 NGC container
48+
on NVIDIA DGX A100 with (8x A100 40G) GPUs.
49+
Performance numbers (in images per second)
50+
were averaged over an entire training epoch.
51+
The specific training script that was run is documented
52+
in the corresponding model's README.
53+
54+
The following table shows the training accuracy results of the
55+
three classification models side-by-side.
56+
57+
58+
| **arch** | **Mixed Precision XLA** | **TF32 XLA** | **Mixed Precision speedup** |
59+
|:-:|:-:|:-:|:-:|
60+
| resnet50 | 16400 img/s | 6300 img/s | 2.60x |
61+
| resnext101-32x4d | 8000 img/s | 2630 img/s | 3.05x |
62+
| se-resnext101-32x4d | 6930 img/s | 2400 img/s | 2.88x |
63+
4364
### Training performance: NVIDIA DGX-1 (8x V100 16G)
4465

4566
Our results were obtained by running the applicable
46-
training scripts in the tensorflow-20.03-tf1-py3 NGC container
67+
training scripts in the tensorflow-20.06-tf1-py3 NGC container
4768
on NVIDIA DGX-1 with (8x V100 16G) GPUs.
4869
Performance numbers (in images per second)
4970
were averaged over an entire training epoch.
@@ -54,11 +75,11 @@ The following table shows the training accuracy results of the
5475
three classification models side-by-side.
5576

5677

57-
| **arch** | **Mixed Precision** | **Mixed Prcesision XLA** | **FP32** | **Mixed Precision speedup** | **XLA Mixed Precision speedup**|
58-
|:-:|:-:|:-:|:-:|:-:|:-:|
59-
| resnet50 | 8277.91 img/s | 9485.21 img/s | 2785.81 img/s | 2.97x | 1.14x |
60-
| resnext101-32x4d | 3151.81 img/s | 4231.42 img/s | 1055.82 img/s | 2.98x | 1.34x |
61-
| se-resnext101-32x4d | 2168.40 img/s | 3297.39 img/s | 921.38 img/s | 2.35x | 1.52x |
78+
| **arch** | **Mixed Precision XLA** | **FP32 XLA** | **Mixed Precision speedup** |
79+
|:-:|:-:|:-:|:-:|
80+
| resnet50 | 9510 img/s | 3170 img/s | 3.00x |
81+
| resnext101-32x4d | 4160 img/s | 1210 img/s | 3.44x |
82+
| se-resnext101-32x4d | 3360 img/s | 1120 img/s | 3.00x |
6283

6384
## Release notes
6485

TensorFlow/Classification/ConvNets/resnet50v1.5/README.md

Lines changed: 196 additions & 73 deletions
Large diffs are not rendered by default.
-27.3 KB
Loading
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#!/bin/bash
2+
3+
DATA_DIR=${1:-"/data/tfrecords"}
4+
DALI_DIR=${2}
5+
6+
BATCH_SIZE_TO_TEST="1 2 4 8 16 32 64 128 256"
7+
INFERENCE_BENCHMARK=$(mktemp /tmp/inference-benchmark.XXXXXX)
8+
9+
function test_configuration() {
10+
echo "Testing configuration: $1" | tee -a $INFERENCE_BENCHMARK
11+
12+
for BATCH in $BATCH_SIZE_TO_TEST; do
13+
python ./main.py --mode=inference_benchmark --warmup_steps 50 --num_iter 400 --iter_unit batch \
14+
--batch_size $BATCH --data_dir=$DATA_DIR --results_dir=/tmp/results $2 | tail -n2 | head -n1 | sed \
15+
's/^DLL \([0-9]*-\)*[0-9]* \([0-9]*:\)*[0-9]*.[0-9]* - ()/Results for BS='$BATCH'/' | tee -a $INFERENCE_BENCHMARK
16+
17+
if [ ! $? -eq 0 ]; then
18+
echo "Failed test on batch size $BATCH_SIZE"
19+
exit 1
20+
fi
21+
done
22+
}
23+
24+
test_configuration "FP32 nodali noxla"
25+
test_configuration "FP32 nodali xla" "--use_xla"
26+
test_configuration "FP16 nodali noxla" "--use_tf_amp"
27+
test_configuration "FP16 nodali xla" "--use_tf_amp --use_xla"
28+
29+
if [ ! -z $DALI_DIR ]; then
30+
test_configuration "FP16 dali xla" "--use_tf_amp --use_xla --use_dali --data_idx_dir ${DALI_DIR}"
31+
fi
32+
33+
cat $INFERENCE_BENCHMARK
34+
rm $INFERENCE_BENCHMARK

TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_250E.sh

Lines changed: 0 additions & 5 deletions
This file was deleted.

TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_50E.sh

Lines changed: 0 additions & 5 deletions
This file was deleted.

TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_90E.sh

Lines changed: 0 additions & 5 deletions
This file was deleted.

TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX2_RN50_AMP_250E.sh

Lines changed: 0 additions & 5 deletions
This file was deleted.

0 commit comments

Comments
 (0)