This is the implementation of GoogLenet paper in PyTorch, however there are many other common factors that were taken care such as:
- Data Augmentation is outside of main class and can be defined in a semi declarative way using albumentations library inside the transformation.py class.
- Automatic Loading and Saving models from and to checkpoint.
- Integration with Tensor Board. The Tensor Board data is being written after a checkpoint save. This is to make sure that, upon restarting the training, the plots are properly drawn. A. Both Training Loss and Validation Accuracy is being written. The code will be modified to also include Training Accuracy and Validation Loss. B. The model is also being stored as graph for visualization.
- Logging has been enabled in both console and external file. The external file name can be configured using the configuration in properties.py.
- Multi-GPU Training has been enabled using
torch.nn.DataParallel()function. - Mixed Precision has been enabled using Nvidia's apex library as the PyTorch 1.6 is not released yet. None: At this moment both Multi-GPU and Mixed Precision can not be using together. This will be fixed once PyTorch 1.6 has been released.
- The network layers sizes can be printed to console for verification.
There are few differences between this implementation and original GoogLeNet paper, which have been highlighted in relevant section below.
The GoogLeNet paper used ImageNet dataset , however this implementation used another dataset named Caltech256 which is very similar to Imagenet but consists of only 256 Categories and around 30K images. Any decent GPU should be able to train using this dataset in much lesser time than ImageNet.
In order to use ImagNet instead of Caltech256, please find the below blog post for more details.
How to prepare imagenet dataset for image classification
Below is the URL of the Caltech256 Dataset.
The pre-processing steps are similar to AlexNet. As GoogLeNet hasn't recommended any additional improvements.
- Create Train/Validation Dataset ( Test labels are not given )
- Center crop images
- Resize image to 256x256 Pixels
- Calculate RGB Mean ( only on train set ) and finally save the global mean to a file named
rgb_val.json.- The RGB mean values is used during training to normalize each images in
ClassificationDatasetclass.
- The RGB mean values is used during training to normalize each images in
- Moves the processed images to a different dir
- Create a file name
categories.csvwith the list if class labels and corresponding ids. - Create train/val csv file with image name ( randomly generated ) and class id.
The common.preprocessing.image_dir_preprocessor.py class performs the pre processing tasks.
None: In case of ImageNet, parallel processing is recommended. Please refer the below blog post for more details.
There were only few types of data augmentation used. Following Data Augmentations are implemented using the
albumentations library in the GoogLeNetNet.transformation.py file.
- Random Crop of 224x224
- Original paper used center crop, I will be using Random Crop here.
- Mean RGB Normalization ( Like AlexNet, ZFNet )
- Horizontal Flip
- Random 90 Degree Rotation
- Random Crop of 224x224 ( Same as training )
- Mean RGB Normalization.
Here are some of the changed applied in this implementation.
- Use Xavier Normal initialization instead of initializing just from a normal distribution.
- The auxiliary outputs are not implemented.
Here are the layers defined by the authors.
Below is the diagram of the inception module. There are total 9 inception module used in the design.
| Layer Type | Output Size | Kernel Size | # of Kernels | Stride | Padding |
|---|---|---|---|---|---|
| Input Image | 224 x 224 x 3 | ||||
| DefaultConvolutionModule | 112 x 112 x 64 | 7 | 64 | 2 | 3 |
| MaxPool2d | 56 x 56 x 64 | 3 | 2 | 1 | |
| DefaultConvolutionModule | 56 x 56 x 64 | 1 | 64 | ||
| DefaultConvolutionModule | 56 x 56 x 192 | 3 | 192 | ||
| MaxPool2d | 28 x 28 x 192 | 3 | 2 | 1 | |
| InceptionModule | 28 x 28 x 256 | ||||
| InceptionModule | 28 x 28 x 480 | ||||
| MaxPool2d | 14 x 14 x 480 | 3 | 2 | 1 | |
| InceptionModule | 14 x 14 x 512 | ||||
| InceptionModule | 14 x 14 x 512 | ||||
| InceptionModule | 14 x 14 x 512 | ||||
| InceptionModule | 14 x 14 x 528 | ||||
| InceptionModule | 14 x 14 x 832 | ||||
| MaxPool2d | 7 x 7 x 832 | 3 | 2 | 1 | |
| InceptionModule | 7 x 7 x 832 | ||||
| InceptionModule | 7 x 7 x 1024 | ||||
| AdaptiveAvgPool2d | 1 x 1 x 1024 | ||||
| Dropout | 1 x 1 x 1024 | ||||
| Flatten | 1 x 1024 | ||||
| Linear | 1 x 256 | ||||
| LogSoftmax | 1 x 256 |
- Used Stochastic Gradient Descent with Nesterov's momentum
- Also used Adam as alternative approach with initial learning rate as 0.001.
- Initial Learning Rate for SGD has been set to
0.01( The authors used 0.001 as initial lr) - In GoogLeNet the learning rate was reduced manually, however we will be using Learning Rate Scheduler.
We will use ReduceLROnPlateau and reduce the learning rate by a factor of 0.5, if there are no improvements after 3 epochs
- ReduceLROnPlateau is dependent on the validation set accuracy.
- Also, used CosineAnnealingLR instead of ReduceLROnPlateau with Adam.
Used Stochastic Gradient Descent with Nesterov's momentum and ReduceLROnPlateau Learning rate scheduler.
Here is the plot of Training/Validation Loss/Accuracy after 70 Epochs. The model is clearly over-fitting, more data augmentation will probably help.
The is the plot of the learning rate decay.
As shown below, the implemented model was able to achieve 55.17% Accuracy while training from scratch.
| Architecture | epochs | Training Loss | Validation Accuracy | Training Accuracy | Learning Rate |
|---|---|---|---|---|---|
| AlexNet | 100 | 0.0777 | 46.51% | 99.42% | 0.01 |
| ZFNet | 100 | 0.0701 | 49.67% | 99.43% | 0.01 |
| VGG13 | 70 | 0.0655 | 53.45% | 99.08% | 0.00125 |
| GoogLeNet_SGD | 70 | 0.2786 | 55.17% | 94.89% | 1.953125e-05 |
- The network was trained using single NVIDIA 2080ti and 32Bit Floating Point.
- 70 training epochs took 59.7 Minutes to complete.
Used Adam optimizer with CosineAnnealingLR Learning rate scheduler. This approach produces better validation set accuracy than previous one.
Here is the plot of Training/Validation Loss/Accuracy after 90 Epochs. The model is clearly over-fitting, more data augmentation will probably help.
The is the plot of the learning rate decay. For first 70 epochs the leaning rate was set between 1e-03 - 1e-05 and from 70-90 the learning rate was between 1e-04 - 1e-07.
As shown below, the implemented model was able to achieve 61.51% Accuracy while training from scratch.
| Architecture | epochs | Training Loss | Validation Accuracy | Training Accuracy |
|---|---|---|---|---|
| AlexNet | 100 | 0.0777 | 46.51% | 99.42% |
| ZFNet | 100 | 0.0701 | 49.67% | 99.43% |
| VGG13 | 70 | 0.0655 | 53.45% | 99.08% |
| GoogLeNet_SGD | 70 | 0.2786 | 55.17% | 94.89% |
| GoogLeNet_Adam | 90 | 0.3104 | 61.51% | 93.64% |
- The network was trained using single NVIDIA 2080ti and 32Bit Floating Point.
- 90 training epochs took 84.7 Minutes to complete.
- Run the following file:
common.preprocessing.image_dir_preprocessor.py- The properties can be changed at
common.preprocessing.properties.py. Here is how the configurations are defined.# Provide the input preprocessing location INPUT_PATH = '/media/4TB/datasets/caltech/256_ObjectCategories' # Provide the output location to store the processed images OUTPUT_PATH = '/media/4TB/datasets/caltech/processed' # Validation split. Range - [ 0.0 - 1.0 ] VALIDATION_SPLIT = 0.2 # Output image dimension. ( height,width ) OUTPUT_DIM = (256, 256) # If RGB mean is needed, set this to True RGB_MEAN = True # If this is false, then the images will only be resized without preserving the aspect ratio. CENTER_CROP = True # Function to provide the logic to parse the class labels from the directory. def read_class_labels(path): return path.split('/')[-1].split('.')[-1]
- Run the following files:
GoogLeNet.train.pyGoogLeNet.test.py- The test.py will automatically pickup the last saved checkpoint by training
- The properties can be changed at
GoogLeNet.properties.py. Here is how the configurations are defined.
config = dict()
config['PROJECT_NAME'] = 'googlenet'
config['INPUT_DIR'] = '/media/4TB/datasets/caltech/processed'
config['TRAIN_DIR'] = f"{config['INPUT_DIR']}/train"
config['VALID_DIR'] = f"{config['INPUT_DIR']}/val"
config['TRAIN_CSV'] = f"{config['INPUT_DIR']}/train.csv"
config['VALID_CSV'] = f"{config['INPUT_DIR']}/val.csv"
config['CHECKPOINT_INTERVAL'] = 10
config['NUM_CLASSES'] = 256
config['EPOCHS'] = 70
config['MULTI_GPU'] = False
config['FP16_MIXED'] = False
config["LOGFILE"] = "output.log"
config["LOGLEVEL"] = "INFO"I am executing the script remotely from pycharm. Here is a sample output of the train.py
sudo+ssh://home@192.168.50.106:22/home/home/.virtualenvs/dl4cv/bin/python3 -u /home/home/Documents/synch/mini_projects/GoogLeNet/train.py
Building model ...
Training starting now ...
100%|██████████| 191/191 [00:53<00:00, 3.58 batches/s, epoch=1, loss=4.9997, val acc=9.67, train acc=8.201, lr=0.001]
100%|██████████| 191/191 [00:52<00:00, 3.61 batches/s, epoch=2, loss=4.499, val acc=13.639, train acc=12.128, lr=0.0009046039886902864]
100%|██████████| 191/191 [00:53<00:00, 3.59 batches/s, epoch=3, loss=4.1048, val acc=16.923, train acc=16.091, lr=0.0006548539886902863]
100%|██████████| 191/191 [00:53<00:00, 3.60 batches/s, epoch=4, loss=3.7222, val acc=21.251, train acc=20.967, lr=0.0003461460113097138]
100%|██████████| 191/191 [00:53<00:00, 3.59 batches/s, epoch=5, loss=3.4461, val acc=25.368, train acc=25.36, lr=9.639601130971379e-05]
100%|██████████| 191/191 [00:53<00:00, 3.58 batches/s, epoch=6, loss=3.3391, val acc=26.527, train acc=27.127, lr=1e-06]
100%|██████████| 191/191 [00:53<00:00, 3.58 batches/s, epoch=8, loss=3.4304, val acc=24.567, train acc=25.356, lr=0.000346146011309714]
100%|██████████| 191/191 [00:53<00:00, 3.59 batches/s, epoch=9, loss=3.4815, val acc=21.66, train acc=24.452, lr=0.0006548539886902867]
100%|██████████| 191/191 [00:53<00:00, 3.56 batches/s, epoch=10, loss=3.4207, val acc=25.025, train acc=25.094, lr=0.000904603988690287]
[1] Going deeper with convolutions
[2] ImageNet Classification with Deep Convolutional Neural Networks
[3] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
[4] Understanding the difficulty of training deep feedforward neural networks





