Skip to content

Adding Imagenet Example#680

Merged
mrwyattii merged 11 commits into
masterfrom
dev/pagolnar/ex_imagenet
Nov 8, 2023
Merged

Adding Imagenet Example#680
mrwyattii merged 11 commits into
masterfrom
dev/pagolnar/ex_imagenet

Conversation

@PareesaMS

Copy link
Copy Markdown
Contributor

This example activated DeepSpeed on the implementation of training a set of popular model architectures on ImageNet dataset. The models include ResNet, AlexNet, and VGG, and the
baseline implementation could be found at pytorch examples Github repository. DeepSpeed activation allows for ease in
running the code in distributed manner, allowing for easily applying fp16 quantization benefitting Zero stage1 memory reduction.

Comment thread training/imagenet/README.md Outdated
## DeepSpeed Optimizations

Applying fp16 quantization and Zero stage 1 memory optimization we were able to reduce the required memory. The table bellow summarizes the results of running resnet 50 on one
node 16 V100 GPUs:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a DGX-1 node (with 16 V100 GPUs)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

Comment thread training/imagenet/README.md Outdated
------------------|-------------------

Furthermore, the memory optimization had no adverse impact on accuracy, a point illustrated by the graph below.
![resnet-plot](C:\Users\pagolnar\OneDrive - Microsoft\Reports-presentations\Resnet-plot)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the image link is wrong.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

Comment thread training/imagenet/README.md Outdated
Baseline| ? | -
Baseline with DS activated | 1.66 | -
DS + fp16 | 1.04 | ?
Ds + fp16 + Zero 1 | 0.81 | ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides memory, how about the training speed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the table. Did not measure the training speed. Should I repeat the experiments?

Comment thread training/imagenet/README.md Outdated
ImageNet dataset is large and time-consuming to download. To get started quickly, run `main.py` using dummy data by "--dummy". It's also useful for training speed benchmark. Note that the loss or accuracy is useless in this case.

```bash
python main.py -a resnet18 --dummy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is deepspeed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@@ -0,0 +1,2 @@
torch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepspeed is also a requirement?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Fixed the issue

Comment thread training/imagenet/README.md Outdated
Baseline| ? | -
Baseline with DS activated | 1.66 | -
DS + fp16 | 1.04 | ?
Ds + fp16 + Zero 1 | 0.81 | ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table format is not correct. take a look at rendered website

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@mrwyattii mrwyattii merged commit ccb2a34 into master Nov 8, 2023
hwchen2017 pushed a commit that referenced this pull request Jun 8, 2025
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants