@@ -231,7 +231,7 @@ and encapsulates some dependencies. Aside from these dependencies, ensure you
231231have the following components:
232232
233233* [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
234- * [PyTorch 19.06 -py3+ NGC container](https://ngc.nvidia.com/registry/nvidia-pytorch)
234+ * [PyTorch 20.01 -py3+ NGC container](https://ngc.nvidia.com/registry/nvidia-pytorch)
235235or newer
236236* [NVIDIA Volta](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/) or [Turing](https://www.nvidia.com/en-us/geforce/turing/) based GPU
237237
@@ -370,7 +370,7 @@ WaveGlow models.
370370
371371* ` --epochs` - number of epochs (Tacotron 2: 1501, WaveGlow: 1001)
372372* ` --learning-rate` - learning rate (Tacotron 2: 1e-3, WaveGlow: 1e-4)
373- * ` --batch-size` - batch size (Tacotron 2 FP16/FP32: 128/64 , WaveGlow FP16/FP32: 10/4)
373+ * ` --batch-size` - batch size (Tacotron 2 FP16/FP32: 104/48 , WaveGlow FP16/FP32: 10/4)
374374* ` --amp-run` - use mixed precision training
375375
376376# ### Shared audio/STFT parameters
@@ -496,21 +496,21 @@ To benchmark the training performance on a specific batch size, run:
496496* For 1 GPU
497497 * FP16
498498 ` ` ` bash
499- python train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --training-files filelists/ljs_audio_text_train_subset_2500_filelist .txt --dataset-path < dataset-path> --amp-run
499+ python train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --load-mel-from-disk -- training-files=filelists/ljs_mel_text_train_subset_2500_filelist.txt --validation-files= filelists/ljs_mel_text_val_filelist .txt --dataset-path < dataset-path> --amp-run
500500 ` ` `
501501 * FP32
502502 ` ` ` bash
503- python train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --training-files filelists/ljs_audio_text_train_subset_2500_filelist .txt --dataset-path < dataset-path>
503+ python train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --load-mel-from-disk -- training-files=filelists/ljs_mel_text_train_subset_2500_filelist.txt --validation-files= filelists/ljs_mel_text_val_filelist .txt --dataset-path < dataset-path>
504504 ` ` `
505505
506506* For multiple GPUs
507507 * FP16
508508 ` ` ` bash
509- python -m multiproc train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --training-files filelists/ljs_audio_text_train_subset_2500_filelist .txt --dataset-path < dataset-path> --amp-run
509+ python -m multiproc train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --load-mel-from-disk -- training-files=filelists/ljs_mel_text_train_subset_2500_filelist.txt --validation-files= filelists/ljs_mel_text_val_filelist .txt --dataset-path < dataset-path> --amp-run
510510 ` ` `
511511 * FP32
512512 ` ` ` bash
513- python -m multiproc train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --training-files filelists/ljs_audio_text_train_subset_2500_filelist .txt --dataset-path < dataset-path>
513+ python -m multiproc train.py -m Tacotron2 -o < output_dir> -lr 1e-3 --epochs 10 -bs < batch_size> --weight-decay 1e-6 --grad-clip-thresh 1.0 --cudnn-enabled --log-file nvlog.json --load-mel-from-disk -- training-files=filelists/ljs_mel_text_train_subset_2500_filelist.txt --validation-files= filelists/ljs_mel_text_val_filelist .txt --dataset-path < dataset-path>
514514 ` ` `
515515
516516** WaveGlow**
@@ -579,10 +579,10 @@ All of the results were produced using the `train.py` script as described in the
579579| WaveGlow FP16 | -2.2054 | -5.7602 | -5.901 | -5.9706 | -6.0258 |
580580| WaveGlow FP32 | -3.0327 | -5.858 | -6.0056 | -6.0613 | -6.1087 |
581581
582- Tacotron 2 FP16 loss - batch size 128 (mean and std over 16 runs)
582+ Tacotron 2 FP16 loss - batch size 104 (mean and std over 16 runs)
583583! [](./img/tacotron2_amp_loss.png " Tacotron 2 FP16 loss" )
584584
585- Tacotron 2 FP32 loss - batch size 64 (mean and std over 16 runs)
585+ Tacotron 2 FP32 loss - batch size 48 (mean and std over 16 runs)
586586! [](./img/tacotron2_fp32_loss.png " Tacotron 2 FP16 loss" )
587587
588588WaveGlow FP16 loss - batch size 10 (mean and std over 16 runs)
@@ -597,7 +597,7 @@ WaveGlow FP32 loss - batch size 4 (mean and std over 16 runs)
597597# #### Training performance: NVIDIA DGX-1 (8x V100 16G)
598598
599599Our results were obtained by running the ` ./platform/train_{tacotron2,waveglow}_{AMP,FP32}_DGX1_16GB_8GPU.sh`
600- training script in the PyTorch-19.06 -py3 NGC container on NVIDIA DGX-1 with
600+ training script in the PyTorch-19.12 -py3 NGC container on NVIDIA DGX-1 with
6016018x V100 16G GPUs. Performance numbers (in output mel-spectrograms per second for
602602Tacotron 2 and output samples per second for WaveGlow) were averaged over
603603an entire training epoch.
@@ -606,9 +606,9 @@ This table shows the results for Tacotron 2:
606606
607607| Number of GPUs| Batch size per GPU| Number of mels used with mixed precision| Number of mels used with FP32| Speed-up with mixed precision| Multi-GPU weak scaling with mixed precision| Multi-GPU weak scaling with FP32|
608608| ---:| ---:| ---:| ---:| ---:| ---:| ---:|
609- | 1| 128 @FP16, 64 @FP32 | 20,992 | 12,933 | 1.62 | 1.00 | 1.00 |
610- | 4| 128 @FP16, 64 @FP32 | 74,989 | 46,115 | 1.63 | 3.57 | 3.57 |
611- | 8| 128 @FP16, 64 @FP32 | 140,060 | 88,719 | 1.58 | 6.67 | 6.86 |
609+ | 1| 104 @FP16, 48 @FP32 | 15,313 | 9,674 | 1.58 | 1.00 | 1.00 |
610+ | 4| 104 @FP16, 48 @FP32 | 53,661 | 32,778 | 1.64 | 3.50 | 3.39 |
611+ | 8| 104 @FP16, 48 @FP32 | 100,422 | 59,549 | 1.69 | 6.56 | 6.16 |
612612
613613The following table shows the results for WaveGlow:
614614
@@ -626,9 +626,9 @@ The following table shows the expected training time for convergence for Tacotro
626626
627627| Number of GPUs| Batch size per GPU| Time to train with mixed precision (Hrs)| Time to train with FP32 (Hrs)| Speed-up with mixed precision|
628628| ---:| ---:| ---:| ---:| ---:|
629- | 1| 128 @FP16, 64 @FP32 | 153 | 234 | 1.53 |
630- | 4| 128 @FP16, 64 @FP32 | 42 | 64 | 1.54 |
631- | 8| 128 @FP16, 64 @FP32 | 22 | 33 | 1.52 |
629+ | 1| 104 @FP16, 48 @FP32 | 193 | 312 | 1.62 |
630+ | 4| 104 @FP16, 48 @FP32 | 53 | 85 | 1.58 |
631+ | 8| 104 @FP16, 48 @FP32 | 31 | 45 | 1.47 |
632632
633633The following table shows the expected training time for convergence for WaveGlow (1001 epochs):
634634
@@ -704,8 +704,11 @@ November 2019
704704* Implemented training resume from checkpoint
705705* Added notebook for running Tacotron 2 and WaveGlow in TRTIS.
706706
707- December 2019
708- * Added ` trt` subfolder for running Tacotron 2 and WaveGlow in TensorRT.
707+ December 2019
708+ * Added export and inference scripts for TensorRT. See [Tacotron2 TensorRT README](trt/README.md).
709+
710+ January 2020
711+ * Updated batch sizes and performance results for Tacotron 2.
709712
710713# ## Known issues
711714
0 commit comments