feifeibear
diff --git a/‎PyTorch/Classification/ConvNets/Dockerfile‎
Lines changed: 1 addition & 1 deletion b/‎PyTorch/Classification/ConvNets/Dockerfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎PyTorch/Classification/ConvNets/README.md‎
Lines changed: 41 additions & 32 deletions b/‎PyTorch/Classification/ConvNets/README.md‎
Lines changed: 41 additions & 32 deletions
diff --git a/‎PyTorch/Classification/ConvNets/checkpoint2model.py‎
Lines changed: 1 addition & 1 deletion b/‎PyTorch/Classification/ConvNets/checkpoint2model.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎PyTorch/Classification/ConvNets/classify.py‎
Lines changed: 6 additions & 3 deletions b/‎PyTorch/Classification/ConvNets/classify.py‎
Lines changed: 6 additions & 3 deletions
@@ -1,4 +1,4 @@
-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.10-py3
+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:20.06-py3
 FROM ${FROM_IMAGE_NAME}
 
 ADD requirements.txt /workspace/
 
@@ -2,13 +2,16 @@
 
 In this repository you will find implementations of various image classification models.
 
+Detailed information on each model can be found here:
+
 ## Table Of Contents
 
 * [Models](#models)
 * [Validation accuracy results](#validation-accuracy-results)
 * [Training performance results](#training-performance-results)
-  * [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-(8x-v100-16G))
-  * [Training performance: NVIDIA DGX-2 (16x V100 32G)](#training-performance-nvidia-dgx-2-(16x-v100-32G))
+  * [Training performance: NVIDIA DGX A100 (8x A100 40GB)](#training-performance-nvidia-dgx-a100-8x-a100-40gb)
+  * [Training performance: NVIDIA DGX-1 16GB (8x V100 16GB)](#training-performance-nvidia-dgx-1-16gb-8x-v100-16gb)
+  * [Training performance: NVIDIA DGX-2 (16x V100 32GB)](#training-performance-nvidia-dgx-2-16x-v100-32gb)
 * [Model comparison](#model-comparison)
   * [Accuracy vs FLOPS](#accuracy-vs-flops)
   * [Latency vs Throughput on different batch sizes](#latency-vs-throughput-on-different-batch-sizes)
@@ -25,14 +28,14 @@ The following table provides links to where you can find additional information
 
 ## Validation accuracy results
 
-Our results were obtained by running the applicable 
-training scripts in the [framework-container-name] NGC container 
-on NVIDIA DGX-1 with (8x V100 16G) GPUs. 
-The specific training script that was run is documented 
+Our results were obtained by running the applicable
+training scripts in the [framework-container-name] NGC container
+on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
+The specific training script that was run is documented
 in the corresponding model's README.
 
 
-The following table shows the validation accuracy results of the 
+The following table shows the validation accuracy results of the
 three classification models side-by-side.
 
 
@@ -45,48 +48,54 @@ three classification models side-by-side.
 
 ## Training performance results
 
-
-### Training performance: NVIDIA DGX-1 (8x V100 16G)
+### Training performance: NVIDIA DGX A100 (8x A100 40GB)
 
 
-Our results were obtained by running the applicable 
-training scripts in the pytorch-19.10 NGC container 
-on NVIDIA DGX-1 with (8x V100 16G) GPUs. 
-Performance numbers (in images per second) 
+Our results were obtained by running the applicable
+training scripts in the pytorch-20.06 NGC container
+on NVIDIA DGX A100 with (8x A100 40GB) GPUs.
+Performance numbers (in images per second)
 were averaged over an entire training epoch.
-The specific training script that was run is documented 
+The specific training script that was run is documented
 in the corresponding model's README.
 
-The following table shows the training accuracy results of the 
+The following table shows the training accuracy results of the
 three classification models side-by-side.
 
 
-| **arch** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
-|:-:|:-:|:-:|:-:|
-| resnet50 | 6888.75 img/s | 2945.37 img/s | 2.34x |
-| resnext101-32x4d | 2384.85 img/s | 1116.58 img/s | 2.14x |
-| se-resnext101-32x4d | 2031.17 img/s | 977.45 img/s | 2.08x |
+|      **arch**       | **Mixed Precision** |   **TF32**    | **Mixed Precision Speedup** |
+|:-------------------:|:-------------------:|:-------------:|:---------------------------:|
+|      resnet50       |    9488.39 img/s    | 5322.10 img/s |            1.78x            |
+|  resnext101-32x4d   |    6758.98 img/s    | 2353.25 img/s |            2.87x            |
+| se-resnext101-32x4d |    4670.72 img/s    | 2011.21 img/s |            2.32x            |
+
+ResNeXt and SE-ResNeXt use [NHWC data layout](https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html) when training using Mixed Precision,
+which improves the model performance. We are currently working on adding it for ResNet.
 
-### Training performance: NVIDIA DGX-2 (16x V100 32G)
 
+### Training performance: NVIDIA DGX-1 16G (8x V100 16GB)
 
-Our results were obtained by running the applicable 
-training scripts in the pytorch-19.10 NGC container 
-on NVIDIA DGX-2 with (16x V100 32G) GPUs. 
-Performance numbers (in images per second) 
+
+Our results were obtained by running the applicable
+training scripts in the pytorch-20.06 NGC container
+on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
+Performance numbers (in images per second)
 were averaged over an entire training epoch.
-The specific training script that was run is documented 
+The specific training script that was run is documented
 in the corresponding model's README.
 
-The following table shows the training accuracy results of the 
+The following table shows the training accuracy results of the
 three classification models side-by-side.
 
 
-| **arch** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
-|:-:|:-:|:-:|:-:|
-| resnet50 | 13443.82 img/s | 6263.41 img/s | 2.15x |
-| resnext101-32x4d | 4473.37 img/s | 2261.97 img/s | 1.98x |
-| se-resnext101-32x4d | 3776.03 img/s | 1953.13 img/s | 1.93x |
+|      **arch**       | **Mixed Precision** |   **FP32**    | **Mixed Precision Speedup** |
+|:-------------------:|:-------------------:|:-------------:|:---------------------------:|
+|      resnet50       |    6565.61 img/s    | 2869.19 img/s |            2.29x            |
+|  resnext101-32x4d   |    3922.74 img/s    | 1136.30 img/s |            3.45x            |
+| se-resnext101-32x4d |    2651.13 img/s    | 982.78 img/s  |            2.70x            |
+
+ResNeXt and SE-ResNeXt use [NHWC data layout](https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html) when training using Mixed Precision,
+which improves the model performance. We are currently working on adding it for ResNet.
 
 
 ## Model Comparison
 
@@ -33,7 +33,7 @@ def add_parser_arguments(parser):
     checkpoint = torch.load(args.checkpoint_path)
 
     model_state_dict = {
-        k[len("module.1.") :] if "module.1." in k else k: v
+        k[len("module.") :] if "module." in k else k: v
         for k, v in checkpoint["state_dict"].items()
     }
 
 
@@ -59,21 +59,24 @@ def add_parser_arguments(parser):
 
 def main(args):
     imgnet_classes = np.array(json.load(open("./LOC_synset_mapping.json", "r")))
-    model = models.build_resnet(args.arch, args.model_config, verbose=False)
+    model = models.build_resnet(args.arch, args.model_config, 1000, verbose=False)
 
     if args.weights is not None:
         weights = torch.load(args.weights)
         model.load_state_dict(weights)
 
     model = model.cuda()
 
-    if args.precision == "FP16":
+    if args.precision in ["AMP", "FP16"]:
         model = network_to_half(model)
 
+
     model.eval()
 
     with torch.no_grad():
-        input = load_jpeg_from_file(args.image, cuda=True, fp16=args.precision!='FP32')
+        input = load_jpeg_from_file(
+            args.image, cuda=True, fp16=args.precision != "FP32"
+        )
 
         output = torch.nn.functional.softmax(model(input), dim=1).cpu().view(-1).numpy()
         top5 = np.argsort(output)[-5:][::-1]
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.10-py3`
	`1`	`+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:20.06-py3`
`2`	`2`	`FROM ${FROM_IMAGE_NAME}`
`3`	`3`
`4`	`4`	`ADD requirements.txt /workspace/`
Original file line number	Diff line number	Diff line change
`@@ -33,7 +33,7 @@ def add_parser_arguments(parser):`
`33`	`33`	`checkpoint = torch.load(args.checkpoint_path)`
`34`	`34`
`35`	`35`	`model_state_dict = {`
`36`		`- k[len("module.1.") :] if "module.1." in k else k: v`
	`36`	`+ k[len("module.") :] if "module." in k else k: v`
`37`	`37`	`for k, v in checkpoint["state_dict"].items()`
`38`	`38`	`}`
`39`	`39`