NVIDIA
diff --git a/‎PyTorch/SpeechSynthesis/FastPitch/README.md‎
Lines changed: 1 addition & 1 deletion b/‎PyTorch/SpeechSynthesis/FastPitch/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow/Classification/ConvNets/Dockerfile‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow/Classification/ConvNets/Dockerfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow/Classification/ConvNets/README.md‎
Lines changed: 29 additions & 8 deletions b/‎TensorFlow/Classification/ConvNets/README.md‎
Lines changed: 29 additions & 8 deletions
diff --git a/‎TensorFlow/Classification/ConvNets/resnet50v1.5/README.md‎
Lines changed: 196 additions & 73 deletions b/‎TensorFlow/Classification/ConvNets/resnet50v1.5/README.md‎
Lines changed: 196 additions & 73 deletions
diff --git a/‎TensorFlow/Classification/ConvNets/resnet50v1.5/imgs/train_loss.png‎
-27.3 KB b/‎TensorFlow/Classification/ConvNets/resnet50v1.5/imgs/train_loss.png‎
-27.3 KB
diff --git a/‎TensorFlow/Classification/ConvNets/resnet50v1.5/inference_benchmark.sh‎
Lines changed: 34 additions & 0 deletions b/‎TensorFlow/Classification/ConvNets/resnet50v1.5/inference_benchmark.sh‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_250E.sh‎
Lines changed: 0 additions & 5 deletions b/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_250E.sh‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_50E.sh‎
Lines changed: 0 additions & 5 deletions b/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_50E.sh‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_90E.sh‎
Lines changed: 0 additions & 5 deletions b/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX1_RN50_AMP_90E.sh‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX2_RN50_AMP_250E.sh‎
Lines changed: 0 additions & 5 deletions b/‎TensorFlow/Classification/ConvNets/resnet50v1.5/training/AMP/DGX2_RN50_AMP_250E.sh‎
Lines changed: 0 additions & 5 deletions
@@ -713,7 +713,7 @@ The used WaveGlow model is a 256-channel model [published on NGC](https://ngc.nv
 
 Our results were obtained by running the `./scripts/inference_benchmark.sh` script in
 the PyTorch 20.03-py3 NGC container. Note that to reproduce the results,
-you need to provide pre-trained checkpoints for FastPitch and WaveGlow. Edit the script to provide your checkpoint filenames.
+you need to provide pre-trained checkpoins for FastPitch and WaveGlow. Edit the script to provide your checkpoint filenames.
 
 Note that performance numbers are related to the length of input. The numbers reported below were taken with a moderate length of 128 characters. For longer utterances even better numbers are expected, as the generator is fully parallel.
 
 
@@ -1,4 +1,4 @@
-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.03-tf1-py3
+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.06-tf1-py3
 FROM ${FROM_IMAGE_NAME}
 
 ADD requirements.txt .
 
@@ -8,7 +8,8 @@ classification
 * [Models](#models)
 * [Validation accuracy results](#validation-accuracy-results)
 * [Training performance results](#training-performance-results)
-  * [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-(8x-v100-16G))
+  * [Training performance: NVIDIA DGX A100 (8x A100 40G)](#training-performance-nvidia-dgx-a100-8x-a100-40g)
+  * [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-8x-v100-16g)
 * [Release notes](#release-notes)
   * [Changelog](#changelog)
 
@@ -25,7 +26,7 @@ The following table provides links to where you can find additional information
 
 ## Validation accuracy results
 
-Our results were obtained by running the applicable training scripts in the tensorflow-20.03-tf1-py3 NGC container 
+Our results were obtained by running the applicable training scripts in the tensorflow-20.06-tf1-py3 NGC container 
 on NVIDIA DGX-1 with (8x V100 16G) GPUs. The specific training script that was run is documented in the corresponding model's README.
 
 The following table shows the validation accuracy results of the 
@@ -40,10 +41,30 @@ three classification models side-by-side.
 
 ## Training performance results
 
+### Training performance: NVIDIA DGX A100 (8x A100 40G)
+
+Our results were obtained by running the applicable 
+training scripts in the tensorflow-20.06-tf1-py3 NGC container 
+on NVIDIA DGX A100 with (8x A100 40G) GPUs. 
+Performance numbers (in images per second) 
+were averaged over an entire training epoch.
+The specific training script that was run is documented 
+in the corresponding model's README.
+
+The following table shows the training accuracy results of the 
+three classification models side-by-side.
+
+
+| **arch** | **Mixed Precision XLA** | **TF32 XLA** | **Mixed Precision speedup** |
+|:-:|:-:|:-:|:-:|
+| resnet50            | 16400 img/s | 6300 img/s | 2.60x |
+| resnext101-32x4d    | 8000 img/s | 2630 img/s | 3.05x |
+| se-resnext101-32x4d | 6930 img/s | 2400 img/s | 2.88x |
+
 ### Training performance: NVIDIA DGX-1 (8x V100 16G)
 
 Our results were obtained by running the applicable 
-training scripts in the tensorflow-20.03-tf1-py3 NGC container 
+training scripts in the tensorflow-20.06-tf1-py3 NGC container 
 on NVIDIA DGX-1 with (8x V100 16G) GPUs. 
 Performance numbers (in images per second) 
 were averaged over an entire training epoch.
@@ -54,11 +75,11 @@ The following table shows the training accuracy results of the
 three classification models side-by-side.
 
 
-| **arch** | **Mixed Precision** | **Mixed Prcesision XLA** | **FP32** | **Mixed Precision speedup** | **XLA Mixed Precision speedup**|
-|:-:|:-:|:-:|:-:|:-:|:-:|
-| resnet50            | 8277.91 img/s | 9485.21 img/s | 2785.81 img/s | 2.97x | 1.14x |
-| resnext101-32x4d    | 3151.81 img/s | 4231.42 img/s | 1055.82 img/s | 2.98x | 1.34x |
-| se-resnext101-32x4d | 2168.40 img/s | 3297.39 img/s | 921.38 img/s  | 2.35x | 1.52x |
+| **arch** | **Mixed Precision XLA** | **FP32 XLA** | **Mixed Precision speedup** |
+|:-:|:-:|:-:|:-:|
+| resnet50            | 9510 img/s | 3170 img/s | 3.00x |
+| resnext101-32x4d    | 4160 img/s | 1210 img/s | 3.44x |
+| se-resnext101-32x4d | 3360 img/s | 1120 img/s | 3.00x |
 
 ## Release notes
 
 
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+DATA_DIR=${1:-"/data/tfrecords"}
+DALI_DIR=${2}
+
+BATCH_SIZE_TO_TEST="1 2 4 8 16 32 64 128 256"
+INFERENCE_BENCHMARK=$(mktemp /tmp/inference-benchmark.XXXXXX)
+
+function test_configuration() {
+    echo "Testing configuration: $1" | tee -a $INFERENCE_BENCHMARK
+
+    for BATCH in $BATCH_SIZE_TO_TEST; do
+        python ./main.py --mode=inference_benchmark --warmup_steps 50 --num_iter 400 --iter_unit batch \
+            --batch_size $BATCH --data_dir=$DATA_DIR --results_dir=/tmp/results $2 | tail -n2 | head -n1 | sed \
+            's/^DLL \([0-9]*-\)*[0-9]* \([0-9]*:\)*[0-9]*.[0-9]* - ()/Results for BS='$BATCH'/' | tee -a $INFERENCE_BENCHMARK
+
+        if [ ! $? -eq 0 ]; then
+            echo "Failed test on batch size $BATCH_SIZE"
+            exit 1
+        fi
+    done
+}
+
+test_configuration "FP32 nodali noxla"
+test_configuration "FP32 nodali xla" "--use_xla"
+test_configuration "FP16 nodali noxla" "--use_tf_amp"
+test_configuration "FP16 nodali xla" "--use_tf_amp --use_xla"
+
+if [ ! -z $DALI_DIR ]; then
+    test_configuration "FP16 dali xla" "--use_tf_amp --use_xla --use_dali --data_idx_dir ${DALI_DIR}"
+fi
+
+cat $INFERENCE_BENCHMARK
+rm $INFERENCE_BENCHMARK
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.03-tf1-py3`
	`1`	`+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.06-tf1-py3`
`2`	`2`	`FROM ${FROM_IMAGE_NAME}`
`3`	`3`
`4`	`4`	`ADD requirements.txt .`