Skip to content

Commit 46ff370

Browse files
committed
[ConvNets/PyT] Adding support for Ampere and 20.06 container
1 parent 4eaa443 commit 46ff370

49 files changed

Lines changed: 1590 additions & 1240 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

PyTorch/Classification/ConvNets/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.10-py3
1+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:20.06-py3
22
FROM ${FROM_IMAGE_NAME}
33

44
ADD requirements.txt /workspace/

PyTorch/Classification/ConvNets/README.md

Lines changed: 41 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,16 @@
22

33
In this repository you will find implementations of various image classification models.
44

5+
Detailed information on each model can be found here:
6+
57
## Table Of Contents
68

79
* [Models](#models)
810
* [Validation accuracy results](#validation-accuracy-results)
911
* [Training performance results](#training-performance-results)
10-
* [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-(8x-v100-16G))
11-
* [Training performance: NVIDIA DGX-2 (16x V100 32G)](#training-performance-nvidia-dgx-2-(16x-v100-32G))
12+
* [Training performance: NVIDIA DGX A100 (8x A100 40GB)](#training-performance-nvidia-dgx-a100-8x-a100-40gb)
13+
* [Training performance: NVIDIA DGX-1 16GB (8x V100 16GB)](#training-performance-nvidia-dgx-1-16gb-8x-v100-16gb)
14+
* [Training performance: NVIDIA DGX-2 (16x V100 32GB)](#training-performance-nvidia-dgx-2-16x-v100-32gb)
1215
* [Model comparison](#model-comparison)
1316
* [Accuracy vs FLOPS](#accuracy-vs-flops)
1417
* [Latency vs Throughput on different batch sizes](#latency-vs-throughput-on-different-batch-sizes)
@@ -25,14 +28,14 @@ The following table provides links to where you can find additional information
2528

2629
## Validation accuracy results
2730

28-
Our results were obtained by running the applicable
29-
training scripts in the [framework-container-name] NGC container
30-
on NVIDIA DGX-1 with (8x V100 16G) GPUs.
31-
The specific training script that was run is documented
31+
Our results were obtained by running the applicable
32+
training scripts in the [framework-container-name] NGC container
33+
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
34+
The specific training script that was run is documented
3235
in the corresponding model's README.
3336

3437

35-
The following table shows the validation accuracy results of the
38+
The following table shows the validation accuracy results of the
3639
three classification models side-by-side.
3740

3841

@@ -45,48 +48,54 @@ three classification models side-by-side.
4548

4649
## Training performance results
4750

48-
49-
### Training performance: NVIDIA DGX-1 (8x V100 16G)
51+
### Training performance: NVIDIA DGX A100 (8x A100 40GB)
5052

5153

52-
Our results were obtained by running the applicable
53-
training scripts in the pytorch-19.10 NGC container
54-
on NVIDIA DGX-1 with (8x V100 16G) GPUs.
55-
Performance numbers (in images per second)
54+
Our results were obtained by running the applicable
55+
training scripts in the pytorch-20.06 NGC container
56+
on NVIDIA DGX A100 with (8x A100 40GB) GPUs.
57+
Performance numbers (in images per second)
5658
were averaged over an entire training epoch.
57-
The specific training script that was run is documented
59+
The specific training script that was run is documented
5860
in the corresponding model's README.
5961

60-
The following table shows the training accuracy results of the
62+
The following table shows the training accuracy results of the
6163
three classification models side-by-side.
6264

6365

64-
| **arch** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
65-
|:-:|:-:|:-:|:-:|
66-
| resnet50 | 6888.75 img/s | 2945.37 img/s | 2.34x |
67-
| resnext101-32x4d | 2384.85 img/s | 1116.58 img/s | 2.14x |
68-
| se-resnext101-32x4d | 2031.17 img/s | 977.45 img/s | 2.08x |
66+
| **arch** | **Mixed Precision** | **TF32** | **Mixed Precision Speedup** |
67+
|:-------------------:|:-------------------:|:-------------:|:---------------------------:|
68+
| resnet50 | 9488.39 img/s | 5322.10 img/s | 1.78x |
69+
| resnext101-32x4d | 6758.98 img/s | 2353.25 img/s | 2.87x |
70+
| se-resnext101-32x4d | 4670.72 img/s | 2011.21 img/s | 2.32x |
71+
72+
ResNeXt and SE-ResNeXt use [NHWC data layout](https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html) when training using Mixed Precision,
73+
which improves the model performance. We are currently working on adding it for ResNet.
6974

70-
### Training performance: NVIDIA DGX-2 (16x V100 32G)
7175

76+
### Training performance: NVIDIA DGX-1 16G (8x V100 16GB)
7277

73-
Our results were obtained by running the applicable
74-
training scripts in the pytorch-19.10 NGC container
75-
on NVIDIA DGX-2 with (16x V100 32G) GPUs.
76-
Performance numbers (in images per second)
78+
79+
Our results were obtained by running the applicable
80+
training scripts in the pytorch-20.06 NGC container
81+
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
82+
Performance numbers (in images per second)
7783
were averaged over an entire training epoch.
78-
The specific training script that was run is documented
84+
The specific training script that was run is documented
7985
in the corresponding model's README.
8086

81-
The following table shows the training accuracy results of the
87+
The following table shows the training accuracy results of the
8288
three classification models side-by-side.
8389

8490

85-
| **arch** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
86-
|:-:|:-:|:-:|:-:|
87-
| resnet50 | 13443.82 img/s | 6263.41 img/s | 2.15x |
88-
| resnext101-32x4d | 4473.37 img/s | 2261.97 img/s | 1.98x |
89-
| se-resnext101-32x4d | 3776.03 img/s | 1953.13 img/s | 1.93x |
91+
| **arch** | **Mixed Precision** | **FP32** | **Mixed Precision Speedup** |
92+
|:-------------------:|:-------------------:|:-------------:|:---------------------------:|
93+
| resnet50 | 6565.61 img/s | 2869.19 img/s | 2.29x |
94+
| resnext101-32x4d | 3922.74 img/s | 1136.30 img/s | 3.45x |
95+
| se-resnext101-32x4d | 2651.13 img/s | 982.78 img/s | 2.70x |
96+
97+
ResNeXt and SE-ResNeXt use [NHWC data layout](https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html) when training using Mixed Precision,
98+
which improves the model performance. We are currently working on adding it for ResNet.
9099

91100

92101
## Model Comparison

PyTorch/Classification/ConvNets/checkpoint2model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def add_parser_arguments(parser):
3333
checkpoint = torch.load(args.checkpoint_path)
3434

3535
model_state_dict = {
36-
k[len("module.1.") :] if "module.1." in k else k: v
36+
k[len("module.") :] if "module." in k else k: v
3737
for k, v in checkpoint["state_dict"].items()
3838
}
3939

PyTorch/Classification/ConvNets/classify.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,21 +59,24 @@ def add_parser_arguments(parser):
5959

6060
def main(args):
6161
imgnet_classes = np.array(json.load(open("./LOC_synset_mapping.json", "r")))
62-
model = models.build_resnet(args.arch, args.model_config, verbose=False)
62+
model = models.build_resnet(args.arch, args.model_config, 1000, verbose=False)
6363

6464
if args.weights is not None:
6565
weights = torch.load(args.weights)
6666
model.load_state_dict(weights)
6767

6868
model = model.cuda()
6969

70-
if args.precision == "FP16":
70+
if args.precision in ["AMP", "FP16"]:
7171
model = network_to_half(model)
7272

73+
7374
model.eval()
7475

7576
with torch.no_grad():
76-
input = load_jpeg_from_file(args.image, cuda=True, fp16=args.precision!='FP32')
77+
input = load_jpeg_from_file(
78+
args.image, cuda=True, fp16=args.precision != "FP32"
79+
)
7780

7881
output = torch.nn.functional.softmax(model(input), dim=1).cpu().view(-1).numpy()
7982
top5 = np.argsort(output)[-5:][::-1]

0 commit comments

Comments
 (0)