Skip to content

Commit 0bc061d

Browse files
committed
Merge: [ResNet50/PaddlePaddle] Changed default ckpt name
2 parents 180b7c0 + 0376731 commit 0bc061d

6 files changed

Lines changed: 20 additions & 20 deletions

File tree

PaddlePaddle/Classification/RN50v1.5/README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -530,10 +530,10 @@ The model will be stored in the directory specified with `--output-dir`, includi
530530
- `.pdopts`: The optimizer information contains all the Tensors used by the optimizer. For Adam optimizer, it contains beta1, beta2, momentum, and so on. All the information will be saved to a file with suffix “.pdopt”. (If the optimizer has no Tensor need to save (like SGD), the file will not be generated).
531531
- `.pdmodel`: The network description is the description of the program. It’s only used for deployment. The description will save to a file with the suffix “.pdmodel”.
532532

533-
The default prefix of model files is `paddle_example`. Model of each epoch would be stored in directory `./output/ResNet/epoch_id/` with three files by default, including `paddle_example.pdparams`, `paddle_example.pdopts`, `paddle_example.pdmodel`. Note that `epoch_id` is 0-based, which means `epoch_id` is from 0 to 89 for a total of 90 epochs. For example, the model of the 89th epoch would be stored in `./output/ResNet/89/paddle_example`
533+
The default prefix of model files is `resnet_50_paddle`. Model of each epoch would be stored in directory `./output/ResNet/epoch_id/` with three files by default, including `resnet_50_paddle.pdparams`, `resnet_50_paddle.pdopts`, `resnet_50_paddle.pdmodel`. Note that `epoch_id` is 0-based, which means `epoch_id` is from 0 to 89 for a total of 90 epochs. For example, the model of the 89th epoch would be stored in `./output/ResNet/89/resnet_50_paddle`
534534

535535
Assume you want to train the ResNet for 90 epochs, but the training process aborts during the 50th epoch due to infrastructure faults. To resume training from the checkpoint, specify `--from-checkpoint` and `--last-epoch-of-checkpoint` with following these steps:
536-
- Set `./output/ResNet/49/paddle_example` to `--from-checkpoint`.
536+
- Set `./output/ResNet/49/resnet_50_paddle` to `--from-checkpoint`.
537537
- Set `--last-epoch-of-checkpoint` to `49`.
538538
Then rerun the training to resume training from the 50th epoch to the 89th epoch.
539539

@@ -546,28 +546,28 @@ python -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 train.py \
546546
--scale-loss 128.0 \
547547
--use-dynamic-loss-scaling \
548548
--data-layout NHWC \
549-
--from-checkpoint ./output/ResNet/49/paddle_example
549+
--from-checkpoint ./output/ResNet/49/resnet_50_paddle
550550
--last-epoch-of-checkpoint 49
551551
```
552552

553-
To start training from pretrained weights, set `--from-pretrained-params` to `./output/ResNet/<epoch_id>/paddle_example`.
553+
To start training from pretrained weights, set `--from-pretrained-params` to `./output/ResNet/<epoch_id>/resnet_50_paddle`.
554554

555555
Example:
556556
```bash
557-
# Train AMP with model initialization by <./your_own_path_to/paddle_example>
557+
# Train AMP with model initialization by <./your_own_path_to/resnet_50_paddle>
558558
python -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 train.py \
559559
--epochs 90 \
560560
--amp \
561561
--scale-loss 128.0 \
562562
--use-dynamic-loss-scaling \
563563
--data-layout NHWC \
564-
--from-pretrained-params ./your_own_path_to/paddle_example
564+
--from-pretrained-params ./your_own_path_to/resnet_50_paddle
565565
```
566566

567567
Make sure:
568-
- Resume from checkpoints: Both `paddle_example.pdopts` and `paddle_example.pdparams` must be in the given path.
569-
- Start from pretrained weights: `paddle_example.pdparams` must be in the given path.
570-
- The prefix `paddle_example` must be added to the end of the given path. For example: set path as `./output/ResNet/89/paddle_example` instead of `./output/ResNet/89/`
568+
- Resume from checkpoints: Both `resnet_50_paddle.pdopts` and `resnet_50_paddle.pdparams` must be in the given path.
569+
- Start from pretrained weights: `resnet_50_paddle.pdparams` must be in the given path.
570+
- The prefix `resnet_50_paddle` must be added to the end of the given path. For example: set path as `./output/ResNet/89/resnet_50_paddle` instead of `./output/ResNet/89/`
571571
- Don't set `--from-checkpoint` and `--from-pretrained-params` at the same time.
572572

573573
The difference between those two is that `--from-pretrained-params` contain only model weights, and `--from-checkpoint`, apart from model weights, contain the optimizer state, and LR scheduler state.
@@ -596,18 +596,18 @@ Note that automatic sparsity (ASP) requires a pretrained model to initialize par
596596

597597
You can apply `scripts/training/train_resnet50_AMP_ASP_90E_DGXA100.sh` we provided to launch ASP + AMP training.
598598
```bash
599-
# Default path to pretrained parameters is ./output/ResNet50/89/paddle_example
599+
# Default path to pretrained parameters is ./output/ResNet50/89/resnet_50_paddle
600600
bash scripts/training/train_resnet50_AMP_ASP_90E_DGXA100.sh <pretrained_parameters>
601601
```
602602

603603
Or following steps below to manually launch ASP + AMP training.
604604

605-
First, set `--from-pretrained-params` to a pretrained model file. For example, if you have trained the ResNet for 90 epochs following [Training process](#training-process), the final pretrained weights would be stored in `./output/ResNet50/89/paddle_example.pdparams` by default, and set `--from-pretrained-params` to `./output/ResNet/89/paddle_example`.
605+
First, set `--from-pretrained-params` to a pretrained model file. For example, if you have trained the ResNet for 90 epochs following [Training process](#training-process), the final pretrained weights would be stored in `./output/ResNet50/89/resnet_50_paddle.pdparams` by default, and set `--from-pretrained-params` to `./output/ResNet/89/resnet_50_paddle`.
606606

607607
Then run following command to run AMP + ASP:
608608
```bash
609609
python -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 train.py \
610-
--from-pretrained-params ./output/ResNet50/89/paddle_example \
610+
--from-pretrained-params ./output/ResNet50/89/resnet_50_paddle \
611611
--epochs 90 \
612612
--amp \
613613
--scale-loss 128.0 \
@@ -646,7 +646,7 @@ To run inference with TensorRT for the best performance, you can apply the scrip
646646

647647
For example,
648648
1. Run `bash scripts/inference/export_resnet50_AMP.sh <your_checkpoint>` to export an inference model.
649-
- The default path of checkpoint is `./output/ResNet/89/paddle_example`.
649+
- The default path of checkpoint is `./output/ResNet/89/resnet_50_paddle`.
650650
2. Run `bash scripts/inference/infer_resnet50_AMP.sh` to infer with TensorRT.
651651

652652
Or you could manually run `export_model.py` and `inference.py` with specific arguments, refer to [Command-line options](#command-line-options).

PaddlePaddle/Classification/RN50v1.5/scripts/inference/export_resnet50_AMP.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
CKPT=${1:-"./output/ResNet50/89/paddle_example"}
15+
CKPT=${1:-"./output/ResNet50/89/resnet_50_paddle"}
1616

1717
python -m paddle.distributed.launch --gpus=0 export_model.py \
1818
--amp \

PaddlePaddle/Classification/RN50v1.5/scripts/inference/export_resnet50_TF32.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
CKPT=${1:-"./output/ResNet50/89/paddle_example"}
15+
CKPT=${1:-"./output/ResNet50/89/resnet_50_paddle"}
1616

1717
python -m paddle.distributed.launch --gpus=0 export_model.py \
1818
--trt-inference-dir ./inference_tf32 \

PaddlePaddle/Classification/RN50v1.5/scripts/training/train_resnet50_AMP_ASP_90E_DGXA100.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
CKPT=${1:-"./output/ResNet50/89/paddle_example"}
15+
CKPT=${1:-"./output/ResNet50/89/resnet_50_paddle"}
1616

1717
python -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 train.py \
18-
--from-pretrained-params ./output/ResNet50/89/paddle_example \
18+
--from-pretrained-params ./output/ResNet50/89/resnet_50_paddle \
1919
--epochs 90 \
2020
--amp \
2121
--scale-loss 128.0 \

PaddlePaddle/Classification/RN50v1.5/utils/config.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
def print_args(args):
2424
args_for_log = copy.deepcopy(args)
2525

26-
# Due to dllogger cannot serializable Enum into JSON.
26+
# Due to dllogger cannot serialize Enum into JSON.
2727
args_for_log.run_scope = args_for_log.run_scope.value
2828

2929
dllogger.log(step='PARAMETER', data=vars(args_for_log))
@@ -60,7 +60,7 @@ def check_and_process_args(args):
6060
RunScope.TRAIN_ONLY, RunScope.EVAL_ONLY
6161
], "If benchmark enabled, run_scope must be `train_only` or `eval_only`"
6262

63-
# Only run one epoch when benchmark on eval_only.
63+
# Only run one epoch when benchmark or eval_only.
6464
if args.benchmark or \
6565
(args.run_scope == RunScope.EVAL_ONLY):
6666
args.epochs = args.start_epoch + 1

PaddlePaddle/Classification/RN50v1.5/utils/save_load.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ def init_program(args, program, exe):
146146
init_pretrained(args.from_pretrained_params, program)
147147

148148

149-
def save_model(program, model_path, epoch_id, prefix='paddle_example'):
149+
def save_model(program, model_path, epoch_id, prefix='resnet_50_paddle'):
150150
"""
151151
Save a model to given path.
152152
Args:

0 commit comments

Comments
 (0)