OptimizedHCT

ECE-GY 9143 - High Performance Machine Learning Project (NYU)

Description

The goal of this project is to optimize the [1]High-resolution Convolutional Transformer (HCT) model by leveraging High-Performance Computing (HPC) techniques, including distributed data parallelism and other advanced optimization strategies. By enhancing the efficiency and scalability of HCT, the project aims to enable faster and more effective processing of high-resolution images in applications such as medical imaging and satellite image analysis, ultimately advancing the state-of-the-art in image-based tasks through optimized model architectures and computational methodologies.

Wandb Link

Weight & Biases

Project Milestones

Milestone

Milestone	Status
Write data module to handle the dataset (load, pre-process, make it pytorch compatible)	Completed
Write a basic trainer as per model paper to measure time and accuracy of model on single GPU.	Completed
Integrate Weights & Biases and PyTorch Profiler	Completed
Optimize training time by using PyTorch distributed data parallel	Completed
Make it torchscript compatible to make it deployable on non-python environments	Completed
Use additional quantization techniques to reduce inference time further	Completed

Repository Structure

├── ...
├── cpp
│ ├── native_hct.cpp (To load TorchScript Model using C++)
|
|── layer
| ├── ac_layers.py (Attention-Convolution block for Transformer)
| ├── performer_attention.py (Linear Self-Attention)
| ├── resnet_layers.py (Convolution layers for early stages)
|
|── dataset
| ├── dataset/image_labels.csv (For PyTorch Dataset class)
| ├── dataset/image_labels_valid.csv (For PyTorch Dataset class)
|
|── hct_base.py (Model Starting point)
|── datascript.py (To process data and make PyTorch compatible)
|── data.py (Dataset class for PyTorch Dataloader)
|── trainer.py (Single Device Trainer)
|── DistributedTrainer.py (Distributed Data Parallel Trainer)
|── inference.py (Inference Engine)
|── train.py (Run this file to train the model)
|── inf.py (Use this file for inference purpose)
|── params.py (For command line arguments)
|── submit.sh (To submit job on HPC)
|── submit2.sh (To submit job on HPC)
|── quantized
|  ├── quantized/main.py (static quantization)
|── quantized_dynamic
|  ├── quantized_dynamic/main.py (dynamic quantization)
|── quantized_partial
|  |── quantized_partial/main.py (attempt to partial static quantization but resulted in worse time)

Steps to Run

Required Softwares/Libraries

Pytorch
Wandb
PyTorch LibTorch (For C++) (with CUDA Support)
OpenCV C++
Torch_tb_profiler
einops (Python Library for Array Manuplation)

Training

Using HPC

sbatch submit.sh

Normal Mode

python train.py

usage: train.py [-h] [--batch-size N] [--num-workers NUM_WORKERS] [--epochs N] [--lr LR] [--device DEVICE] [--optimizer OPTIMIZER] [--momentum N]
                [--weight-decay N] [--dataset-path DATASET_PATH] [--wandb WANDB] [--project PROJECT] [--training-mode TRAINING_MODE] [--num-devices NUM_DEVICES]
                [--log-path LOG_PATH]

High Resolution Convolutional Transformer

options:
  -h, --help            show this help message and exit
  --batch-size N        input batch size for training (default: 16)
  --num-workers NUM_WORKERS
                        Number of I/O processes (default: 2)
  --epochs N            Number of epochs to train (default: 80)
  --lr LR               learning rate (default: 2e-5)
  --device DEVICE       Device to be used for training(cpu/cuda)
  --optimizer OPTIMIZER
                        Optimizer to be used (default: Adam)
  --momentum N          Momentum Value (default: 0.9)
  --weight-decay N      Weight Decay value (default 5e-4)
  --dataset-path DATASET_PATH
                        Path of LIU4K Dataset (default: ./dataset)
  --wandb WANDB         Use Weight & Biases to track training progress (0/1)
  --project PROJECT     Project Name for W&B
  --training-mode TRAINING_MODE
                        Training Mode (Parallel/Distributed/Single)
  --num-devices NUM_DEVICES
                        Number of Learners/GPUs
  --log-path LOG_PATH   Path where logs will be captured

Note: Need to change wandb API key in submit.sh file

Inference

Using HPC

sbatch submit2.sh

Normal Mode

python inf.py

C++

cd cpp
mkdir build
cd build
cmake ..
cmake --build .
./native_hct <model_path> <image_path> <device>

Quantization

Static Quantization:

cd quantized
sbatch submit.sh

Dynamic Quantization:

cd quantized_dynamic
sbatch submit.sh

Results

Per Epoch	Single Device	Distributed (4 GPUs)	SpeedUp
Data Loading	1000 secs	271 secs	3.7
Training	160 secs	90 secs	1.78
Running	1170 secs	360 secs	3.25

Note:

Training => Data Movement from CPU to GPU + Forward pass + Backward Pass + Metrics Calculation
Running => Data Loading + Training
Model Evaluation is not included in time calculation

Legends for below:

References

[1] Taha, A., Truong Vu, Y.N., MombourqueQe, B., MaQhews, T.P., Su, J. and Singh, S., 2022, September. Deep is a Luxury We Don’t Have. MICCAI 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OptimizedHCT

Description

Wandb Link

Project Milestones

Milestone

Repository Structure

Steps to Run

Required Softwares/Libraries

Training

Using HPC

Normal Mode

Note: Need to change wandb API key in submit.sh file

Inference

Using HPC

Normal Mode

C++

Quantization

Results

Note:

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
cpp		cpp
dataset		dataset
layer		layer
log		log
misc		misc
quantized		quantized
quantized_dynamic		quantized_dynamic
quantized_partial		quantized_partial
saved_model		saved_model
DistributedTrainer.py		DistributedTrainer.py
Optimization_HCT.pdf		Optimization_HCT.pdf
README.md		README.md
data.py		data.py
datascript.py		datascript.py
hct_base.py		hct_base.py
inf.py		inf.py
inf_batch.txt		inf_batch.txt
inf_image.txt		inf_image.txt
inf_quantized.txt		inf_quantized.txt
inference.py		inference.py
params.py		params.py
presentation.pptx		presentation.pptx
submit.sh		submit.sh
submit2.sh		submit2.sh
train.py		train.py
trainer.py		trainer.py

Folders and files

Latest commit

History

Repository files navigation

OptimizedHCT

Description

Wandb Link

Project Milestones

Milestone

Repository Structure

Steps to Run

Required Softwares/Libraries

Training

Using HPC

Normal Mode

Note: Need to change wandb API key in submit.sh file

Inference

Using HPC

Normal Mode

C++

Quantization

Results

Note:

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages