You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here, `Tacotron2_checkpoint` and `WaveGlow_checkpoint` are pre-trained
474
-
checkpoints forthe respective models, and `phrases/phrase.txt` contains input phrases. The number of text lines determines the inference batch size. Audio will be savedin the output folder.
474
+
Here, `Tacotron2_checkpoint` and `WaveGlow_checkpoint` are pre-trained
475
+
checkpoints for the respective models, and `phrases/phrase.txt` contains input
476
+
phrases. The number of text lines determines the inference batch size. Audio
477
+
will be saved in the output folder. The audio files [audio_fp16](./audio/audio_fp16.wav)
478
+
and [audio_fp32](./audio/audio_fp32.wav) were generated using checkpoints from
479
+
mixed precision and FP32 training, respectively.
475
480
476
481
You can find all the available options by calling `python inference.py --help`.
477
482
@@ -548,9 +553,9 @@ To benchmark the inference performance on a batch size=1, run:
548
553
```
549
554
550
555
The output log files will contain performance numbers for Tacotron 2 model
551
-
(number of output mel-spectrograms per second, reported as `tacotron2_items_per_sec`)
552
-
and for WaveGlow (number of output samples per second, reported as `waveglow_items_per_sec`).
553
-
The `inference.py` script will run a few warmup iterations before running the benchmark.
556
+
(number of output mel-spectrograms per second, reported as `tacotron2_items_per_sec`)
557
+
and for WaveGlow (number of output samples per second, reported as `waveglow_items_per_sec`).
558
+
The `inference.py` script will run a few warmup iterations before running the benchmark.
554
559
555
560
### Results
556
561
@@ -635,31 +640,36 @@ The following table shows the expected training time for convergence for WaveGlo
635
640
636
641
#### Inference performance results
637
642
638
-
##### NVIDIA DGX-1 (8x V100 16G)
639
-
640
-
Our results were obtained by running the `./inference.py` inference script in
641
-
the PyTorch-19.06-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
642
-
Performance numbers (in output mel-spectrograms per second for Tacotron 2 and
643
-
output samples per second for WaveGlow) were averaged over 16 runs.
644
-
645
-
The following table shows the inference performance results for Tacotron 2 model.
646
-
Results are measured in the number of output mel-spectrograms per second.
647
-
648
-
|Number of GPUs|Number of mels used with mixed precision|Number of mels used with FP32|Speed-up with mixed precision|
649
-
|---:|---:|---:|---:|
650
-
|**1**|625|613|1.02|
651
-
652
-
The following table shows the inference performance results for WaveGlow model.
653
-
Results are measured in the number of output samples per second<sup>1</sup>.
654
-
655
-
|Number of GPUs|Number of samples used with mixed precision|Number of samples used with FP32|Speed-up with mixed precision|
656
-
|---:|---:|---:|---:|
657
-
|**1**|180474|162282|1.11|
658
-
659
-
<sup>1</sup>With sampling rate equal to 22050, one second of audio is generated from 22050 samples.
660
-
661
-
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
662
-
643
+
The following tables show inference statistics for the Tacotron2 and WaveGlow
644
+
text-to-speech system, gathered from 1000 inference runs, on 1 V100 and 1 T4,
645
+
respectively. Latency is measured from the start of Tacotron 2 inference to
646
+
the end of WaveGlow inference. The tables include average latency, latency standard
647
+
deviation, and latency confidence intervals. Throughput is measured
648
+
as the number of generated audio samples per second. RTF is the real-time factor
649
+
which tells how many seconds of speech are generated in 1 second of compute.
0 commit comments