Adding Imagenet Example#680
Merged
Merged
Conversation
yaozhewei
reviewed
Aug 16, 2023
| ## DeepSpeed Optimizations | ||
|
|
||
| Applying fp16 quantization and Zero stage 1 memory optimization we were able to reduce the required memory. The table bellow summarizes the results of running resnet 50 on one | ||
| node 16 V100 GPUs: |
Contributor
There was a problem hiding this comment.
on a DGX-1 node (with 16 V100 GPUs)
| ------------------|------------------- | ||
|
|
||
| Furthermore, the memory optimization had no adverse impact on accuracy, a point illustrated by the graph below. | ||
|  |
Contributor
There was a problem hiding this comment.
the image link is wrong.
| Baseline| ? | - | ||
| Baseline with DS activated | 1.66 | - | ||
| DS + fp16 | 1.04 | ? | ||
| Ds + fp16 + Zero 1 | 0.81 | ? |
Contributor
There was a problem hiding this comment.
besides memory, how about the training speed
Contributor
Author
There was a problem hiding this comment.
Fixed the table. Did not measure the training speed. Should I repeat the experiments?
| ImageNet dataset is large and time-consuming to download. To get started quickly, run `main.py` using dummy data by "--dummy". It's also useful for training speed benchmark. Note that the loss or accuracy is useless in this case. | ||
|
|
||
| ```bash | ||
| python main.py -a resnet18 --dummy |
| @@ -0,0 +1,2 @@ | |||
| torch | |||
Contributor
There was a problem hiding this comment.
deepspeed is also a requirement?
Contributor
Author
There was a problem hiding this comment.
Definitely. Fixed the issue
yaozhewei
reviewed
Aug 16, 2023
| Baseline| ? | - | ||
| Baseline with DS activated | 1.66 | - | ||
| DS + fp16 | 1.04 | ? | ||
| Ds + fp16 + Zero 1 | 0.81 | ? |
Contributor
There was a problem hiding this comment.
table format is not correct. take a look at rendered website
mrwyattii
approved these changes
Nov 8, 2023
hwchen2017
pushed a commit
that referenced
this pull request
Jun 8, 2025
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This example activated DeepSpeed on the implementation of training a set of popular model architectures on ImageNet dataset. The models include ResNet, AlexNet, and VGG, and the
baseline implementation could be found at pytorch examples Github repository. DeepSpeed activation allows for ease in
running the code in distributed manner, allowing for easily applying fp16 quantization benefitting Zero stage1 memory reduction.