Skip to content

Commit 70f3e43

Browse files
committed
Remove dllogger and fix bugs from GH
Signed-off-by: Pablo Ribalta <pribalta@nvidia.com>
1 parent 5626846 commit 70f3e43

46 files changed

Lines changed: 957 additions & 1345 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

TensorFlow/Segmentation/UNet_Industrial/Dockerfile

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,15 @@
1616
#
1717
# ==============================================================================
1818

19-
FROM nvcr.io/nvidia/tensorflow:19.05-py3
19+
FROM nvcr.io/nvidia/tensorflow:20.01-tf1-py3
2020

2121
LABEL version="1.0" maintainer="Jonathan DEKHTIAR <jonathan.dekhtiar@nvidia.com>"
2222

23+
WORKDIR /opt
24+
COPY requirements.txt /opt/requirements_unet_tf_industrial.txt
25+
26+
RUN python -m pip --no-cache-dir --no-cache install --upgrade pip && \
27+
pip --no-cache-dir --no-cache install -r /opt/requirements_unet_tf_industrial.txt
28+
2329
ADD . /workspace/unet_industrial
2430
WORKDIR /workspace/unet_industrial
25-
26-
RUN pip install dllogger/

TensorFlow/Segmentation/UNet_Industrial/README.md

Lines changed: 17 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ Aside from these dependencies, ensure you have the following components:
138138

139139
* [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
140140

141-
* [TensorFlow 19.03-py3 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow)
141+
* [TensorFlow 19.12-tf1-py3 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow)
142142
* (optional) NVIDIA Volta GPU (see section below) - for best training performance using mixed precision
143143

144144
For more information about how to get started with NGC containers, see the
@@ -219,11 +219,6 @@ cd scripts/
219219
./UNet_FP32_EVAL.sh <path to result repository> <path to dataset> <DAGM2007 classID (1-10)>
220220
```
221221

222-
If you wish to evaluate external checkpoint, make sure to put the TF ckpt files inside a folder named "checkpoints"
223-
and provide its parent path as `<path to result repository>` in the example above.
224-
Be aware that the script will not fail if it does not find the checkpoint.
225-
It will randomly initialize the weights and run performance tests.
226-
227222
## Advanced
228223

229224
The following sections provide greater details of the dataset, running training and inference, and the training results.
@@ -374,7 +369,7 @@ The following sections provide details on the achieved results in training accur
374369
#### Training accuracy results
375370

376371
Our results were obtained by running the `./scripts/UNet_{FP32, AMP}_{1, 4, 8}GPU.sh` training
377-
script in the Tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
372+
script in the Tensorflow:19.12-tf1-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
378373

379374
##### Threshold = 0.75
380375

@@ -481,30 +476,29 @@ script in the Tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16
481476
<!-- Spreedsheet to Markdown: https://thisdavej.com/copy-table-in-excel-and-paste-as-a-markdown-table/ -->
482477

483478
Our results were obtained by running the scripts
484-
`./scripts/benchmarking/DGX1v_trainbench_{FP16, FP32, FP32AMP, FP32FM}_{1, 4, 8}GPU.sh` training script in the
485-
TensorFlow 19.03-py3 NGC container on an NVIDIA DGX-1 with 8 V100 16G GPUs.
486-
487-
488-
| # GPUs | Precision | Throughput (Imgs/sec) | Training Time | Speedup |
489-
|--------|---------------------------------|-----------------------|---------------|---------|
490-
| 1 | FP32 | 89 | 7m44 | 1.00 |
491-
| 1 | Automatic Mixed Precision (AMP) | 104 | 6m40 | 1.17 |
492-
| 4 | FP32 | 261 | 2m48 | 1.00 |
493-
| 4 | Automatic Mixed Precision (AMP) | 302 | 2m27 | 1.16 |
494-
| 8 | FP32 | 445 | 1m44 | 1.00 |
495-
| 8 | Automatic Mixed Precision (AMP) | 491 | 1m36 | 1.10 |
479+
`./scripts/benchmarking/DGX1v_trainbench_{FP32, AMP}_{1, 4, 8}GPU.sh` training script in the
480+
TensorFlow `19.12-tf1-py3` NGC container on an NVIDIA DGX-1 with 8 V100 16G GPUs.
481+
482+
| # GPUs | Precision | Throughput (Imgs/sec) | AMP Speedup | Scaling efficiency |
483+
|--------|---------------------------------|-----------------------|-------------|--------------------|
484+
| 1 | FP32 | 92 | 1.00 | 1.00 |
485+
| 1 | Automatic Mixed Precision (AMP) | 167 | 1.82 | 1.00 |
486+
| 4 | FP32 | 299 | 1.00 | 3.25 |
487+
| 4 | Automatic Mixed Precision (AMP) | 458 | 1.53 | 2.74 |
488+
| 8 | FP32 | 507 | 1.00 | 5.51 |
489+
| 8 | Automatic Mixed Precision (AMP) | 561 | 1.11 | 3.36 |
496490

497491
To achieve these same results, follow the [Quick Start Guide](#quick-start-guide) outlined above.
498492

499493
#### Inference performance results
500494

501-
Our results were obtained by running the aforementioned scripts in the TensorFlow
502-
19.03-py3 NGC container on an NVIDIA DGX-1 server with 8 V100 16G GPUs.
495+
Our results were obtained by running the scripts `./scripts/benchmarking/DGX1v_evalbench_{FP32, AMP}.sh`
496+
evaluation script in the `19.12-tf1-py3` NGC container on an NVIDIA DGX-1 server with 8 V100 16G GPUs.
503497

504498
| # GPUs | Precision | Throughput (Imgs/sec) | Speedup |
505499
|--------|---------------------------------|-----------------------|---------|
506-
| 1 | FP32 | 228 | 1.00 |
507-
| 1 | Automatic Mixed Precision (AMP) | 301 | 1.32 |
500+
| 1 | FP32 | 306 | 1.00 |
501+
| 1 | Automatic Mixed Precision (AMP) | 550 | 1.80 |
508502

509503
To achieve these same results, follow the [Quick Start Guide](#quick-start-guide) outlined above.
510504

TensorFlow/Segmentation/UNet_Industrial/datasets/core.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,6 @@
1919
#
2020
# ==============================================================================
2121

22-
from __future__ import print_function
23-
2422
import os
2523
from abc import ABC, abstractmethod
2624

TensorFlow/Segmentation/UNet_Industrial/datasets/dagm2007.py

Lines changed: 43 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737

3838
from utils import hvd_utils
3939

40-
from dllogger.logger import LOGGER
40+
from dllogger import Logger
4141

4242
__all__ = ['DAGM2007_Dataset']
4343

@@ -109,7 +109,21 @@ def dataset_fn(
109109

110110
shuffle_buffer_size = 10000
111111

112-
def decode_csv(line):
112+
image_dir, csv_file = self._get_data_dirs(training=training)
113+
114+
mask_image_dir = os.path.join(image_dir, "Label")
115+
116+
dataset = tf.data.TextLineDataset(csv_file)
117+
118+
dataset = dataset.skip(1) # Skip CSV Header
119+
120+
if only_defective_images:
121+
dataset = dataset.filter(lambda line: tf.not_equal(tf.strings.substr(line, -1, 1), "0"))
122+
123+
if hvd_utils.is_using_hvd() and training:
124+
dataset = dataset.shard(hvd.size(), hvd.rank())
125+
126+
def _load_dagm_data(line):
113127

114128
input_image_name, image_mask_name, label = tf.decode_csv(
115129
line, record_defaults=[[""], [""], [0]], field_delim=','
@@ -156,10 +170,33 @@ def decode_image(filepath, resize_shape, normalize_data_method):
156170
),
157171
)
158172

173+
label = tf.cast(label, tf.int32)
174+
175+
return tf.data.Dataset.from_tensor_slices(([input_image], [mask_image], [label]))
176+
177+
dataset = dataset.apply(
178+
tf.data.experimental.parallel_interleave(
179+
_load_dagm_data,
180+
cycle_length=batch_size*8,
181+
block_length=4,
182+
buffer_output_elements=batch_size*8
183+
)
184+
)
185+
186+
dataset = dataset.cache()
187+
188+
if training:
189+
dataset = dataset.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=shuffle_buffer_size, seed=seed))
190+
191+
else:
192+
dataset = dataset.repeat()
193+
194+
def _augment_data(input_image, mask_image, label):
195+
159196
if augment_data:
160197

161-
if not hvd_utils.is_using_hvd() or hvd.local_rank() == 0:
162-
LOGGER.log("Using data augmentation ...")
198+
if not hvd_utils.is_using_hvd() or hvd.rank() == 0:
199+
print("Using data augmentation ...")
163200

164201
#input_image = tf.image.per_image_standardization(input_image)
165202

@@ -173,36 +210,11 @@ def decode_image(filepath, resize_shape, normalize_data_method):
173210
input_image = tf.image.rot90(input_image, k=n_rots)
174211
mask_image = tf.image.rot90(mask_image, k=n_rots)
175212

176-
label = tf.cast(label, tf.int32)
177-
178213
return (input_image, mask_image), label
179214

180-
image_dir, csv_file = self._get_data_dirs(training=training)
181-
182-
mask_image_dir = os.path.join(image_dir, "Label")
183-
184-
dataset = tf.data.TextLineDataset(csv_file)
185-
186-
dataset = dataset.skip(1) # Skip CSV Header
187-
188-
if only_defective_images:
189-
dataset = dataset.filter(lambda line: tf.not_equal(tf.strings.substr(line, -1, 1), "0"))
190-
191-
dataset = dataset.cache()
192-
193-
if training:
194-
195-
dataset = dataset.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=shuffle_buffer_size, seed=seed))
196-
197-
if hvd_utils.is_using_hvd():
198-
dataset = dataset.shard(hvd.size(), hvd.rank())
199-
200-
else:
201-
dataset = dataset.repeat()
202-
203215
dataset = dataset.apply(
204216
tf.data.experimental.map_and_batch(
205-
map_func=decode_csv,
217+
map_func=_augment_data,
206218
num_parallel_calls=num_threads,
207219
batch_size=batch_size,
208220
drop_remainder=True,
@@ -212,7 +224,7 @@ def decode_image(filepath, resize_shape, normalize_data_method):
212224
dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE)
213225

214226
if use_gpu_prefetch:
215-
dataset.apply(tf.data.experimental.prefetch_to_device(device="/gpu:0", buffer_size=batch_size * 8))
227+
dataset.apply(tf.data.experimental.prefetch_to_device(device="/gpu:0", buffer_size=4))
216228

217229
return dataset
218230

TensorFlow/Segmentation/UNet_Industrial/dllogger/README.md

Lines changed: 0 additions & 22 deletions
This file was deleted.

TensorFlow/Segmentation/UNet_Industrial/dllogger/dllogger/__init__.py

Lines changed: 0 additions & 19 deletions
This file was deleted.

TensorFlow/Segmentation/UNet_Industrial/dllogger/dllogger/autologging.py

Lines changed: 0 additions & 60 deletions
This file was deleted.

0 commit comments

Comments
 (0)