Skip to content

Commit 4e00153

Browse files
GrzegorzKarchNVnvpstr
authored andcommitted
added TRTIS demo to Tacotron2 (NVIDIA#281)
* added TRTIS demo * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md
1 parent 4760c03 commit 4e00153

21 files changed

Lines changed: 1470 additions & 213 deletions

PyTorch/SpeechSynthesis/Tacotron2/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM nvcr.io/nvidia/pytorch:19.08-py3
1+
FROM nvcr.io/nvidia/pytorch:19.10-py3
22

33
ADD . /workspace/tacotron2
44
WORKDIR /workspace/tacotron2
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Copyright (c) 2019 NVIDIA CORPORATION. All rights reserved.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
14+
FROM nvcr.io/nvidia/tensorrtserver:19.10-py3-clientsdk AS trt
15+
FROM continuumio/miniconda3
16+
RUN apt-get update && apt-get install -y pbzip2 pv bzip2 cabextract mc iputils-ping wget
17+
18+
WORKDIR /workspace/speech_ai_demo__TTS/
19+
20+
# Copy the perf_client over
21+
COPY --from=trt /workspace/install/ /workspace/install/
22+
ENV LD_LIBRARY_PATH /workspace/install/lib:${LD_LIBRARY_PATH}
23+
24+
# set up env variables
25+
ENV PATH="$PATH:/opt/conda/bin"
26+
RUN cd /workspace/speech_ai_demo__TTS/
27+
28+
# jupyter lab extensions
29+
RUN conda install -c conda-forge jupyterlab=1.0 ipywidgets=7.5 nodejs python-sounddevice librosa unidecode inflect
30+
RUN jupyter labextension install @jupyter-widgets/jupyterlab-manager
31+
RUN pip install /workspace/install/python/tensorrtserver*.whl
32+
33+
# Copy the python wheel and install with pip
34+
COPY --from=trt /workspace/install/python/tensorrtserver*.whl /tmp/
35+
RUN pip install /tmp/tensorrtserver*.whl && rm /tmp/tensorrtserver*.whl
36+
37+
RUN cd /workspace/speech_ai_demo__TTS/
38+
COPY ./notebooks/trtis/ .
39+
RUN mkdir /workspace/speech_ai_demo__TTS/tacotron2/
40+
COPY ./tacotron2/text /workspace/speech_ai_demo__TTS/tacotron2/text
41+
RUN chmod a+x /workspace/speech_ai_demo__TTS/run_this.sh

PyTorch/SpeechSynthesis/Tacotron2/README.md

Lines changed: 31 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Tacotron 2 And WaveGlow v1.7 For PyTorch
1+
# Tacotron 2 And WaveGlow v1.10 For PyTorch
22

33
This repository provides a script and recipe to train Tacotron 2 and WaveGlow
44
v1.6 models to achieve state of the art accuracy, and is tested and maintained by NVIDIA.
@@ -33,13 +33,13 @@ v1.6 models to achieve state of the art accuracy, and is tested and maintained b
3333
* [Inference performance benchmark](#inference-performance-benchmark)
3434
* [Results](#results)
3535
* [Training accuracy results](#training-accuracy-results)
36-
* [NVIDIA DGX-1 (8x V100 16G)](#nvidia-dgx-1-8x-v100-16g)
36+
* [Training accuracy: NVIDIA DGX-1 (8x V100 16G)](#training-accuracy-nvidia-dgx-1-8x-v100-16g)
3737
* [Training performance results](#training-performance-results)
38-
* [NVIDIA DGX-1 (8x V100 16G)](#nvidia-dgx-1-8x-v100-16g)
38+
* [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-8x-v100-16g)
3939
* [Expected training time](#expected-training-time)
4040
* [Inference performance results](#inference-performance-results)
41-
* [NVIDIA V100 16G](#nvidia-v100-16g)
42-
* [NVIDIA T4](#nvidia-t4)
41+
* [Inference performance: NVIDIA V100 16G](#inference-performance-nvidia-v100-16g)
42+
* [Inference performance: NVIDIA T4](#inference-performance-nvidia-t4)
4343
* [Release notes](#release-notes)
4444
* [Changelog](#changelog)
4545
* [Known issues](#known-issues)
@@ -471,7 +471,7 @@ To run inference, issue:
471471
```bash
472472
python inference.py --tacotron2 <Tacotron2_checkpoint> --waveglow <WaveGlow_checkpoint> -o output/ --include-warmup -i phrases/phrase.txt --amp-run
473473
```
474-
Here, `Tacotron2_checkpoint` and `WaveGlow_checkpoint` are pre-trained
474+
Here, `Tacotron2_checkpoint` and `WaveGlow_checkpoint` are pre-trained
475475
checkpoints for the respective models, and `phrases/phrase.txt` contains input
476476
phrases. The number of text lines determines the inference batch size. Audio
477477
will be saved in the output folder. The audio files [audio_fp16](./audio/audio_fp16.wav)
@@ -564,7 +564,7 @@ and accuracy in training and inference.
564564

565565
#### Training accuracy results
566566

567-
##### NVIDIA DGX-1 (8x V100 16G)
567+
##### Training accuracy: NVIDIA DGX-1 (8x V100 16G)
568568

569569
Our results were obtained by running the `./platform/train_{tacotron2,waveglow}_{AMP,FP32}_DGX1_16GB_8GPU.sh` training script in the PyTorch-19.06-py3
570570
NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
@@ -594,7 +594,7 @@ WaveGlow FP32 loss - batch size 4 (mean and std over 16 runs)
594594

595595
#### Training performance results
596596

597-
##### NVIDIA DGX-1 (8x V100 16G)
597+
##### Training performance: NVIDIA DGX-1 (8x V100 16G)
598598

599599
Our results were obtained by running the `./platform/train_{tacotron2,waveglow}_{AMP,FP32}_DGX1_16GB_8GPU.sh`
600600
training script in the PyTorch-19.06-py3 NGC container on NVIDIA DGX-1 with
@@ -648,26 +648,27 @@ deviation, and latency confidence intervals. Throughput is measured
648648
as the number of generated audio samples per second. RTF is the real-time factor
649649
which tells how many seconds of speech are generated in 1 second of compute.
650650

651-
##### NVIDIA V100 16G
651+
##### Inference performance: NVIDIA DGX-1 (1x V100 16G)
652652

653-
|Batch size|Input length|Precision|Avg latency (s)|Latency std (s)|Latency confidence interval 50% (s)|Latency confidence interval 100% (s)|Throughput (samples/sec)|Speed-up with mixed precision|Avg mels generated (81 mels=1 sec of speech)|Avg audio length (s)|Avg RTF|
654-
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
655-
|1| 128| FP16| 1.73| 0.07| 1.72| 2.11| 89,162| 1.09| 601| 6.98| 4.04|
656-
|4| 128| FP16| 4.21| 0.17| 4.19| 4.84| 145,800| 1.16| 600| 6.97| 1.65|
657-
|1| 128| FP32| 1.85| 0.06| 1.84| 2.19| 81,868| 1.00| 590| 6.85| 3.71|
658-
|4| 128| FP32| 4.80| 0.15| 4.79| 5.43| 125,930| 1.00| 590| 6.85| 1.43|
653+
|Batch size|Input length|Precision|Avg latency (s)|Latency std (s)|Latency confidence interval 90% (s)|Latency confidence interval 95% (s)|Latency confidence interval 99% (s)|Throughput (samples/sec)|Speed-up with mixed precision|Avg mels generated (81 mels=1 sec of speech)|Avg audio length (s)|Avg RTF|
654+
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
655+
|1| 128| FP16| 1.27| 0.06| 1.34| 1.38| 1.41| 121,190| 1.37| 603| 7.00| 5.51|
656+
|4| 128| FP16| 2.32| 0.09| 2.42| 2.45| 2.59| 277,711| 2.03| 628| 7.23| 3.12|
657+
|1| 128| FP32| 1.70| 0.05| 1.77| 1.79| 1.84| 88,650| 1.00| 590| 6.85| 4.03|
658+
|4| 128| FP32| 4.56| 0.12| 4.72| 4.77| 4.87| 136,518| 1.00| 608| 7.06| 1.55|
659659

660-
##### NVIDIA T4
660+
##### Inference performance: NVIDIA T4
661+
662+
|Batch size|Input length|Precision|Avg latency (s)|Latency std (s)|Latency confidence interval 90% (s)|Latency confidence interval 95% (s)|Latency confidence interval 99% (s)|Throughput (samples/sec)|Speed-up with mixed precision|Avg mels generated (81 mels=1 sec of speech)|Avg audio length (s)|Avg RTF|
663+
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
664+
|1| 128| FP16| 3.13| 0.13| 3.28| 3.36| 3.46| 49,276| 1.26| 602| 6.99| 2.24|
665+
|4| 128| FP16| 11.98| 0.42| 12.44| 12.70| 13.29| 53,676| 1.23| 628| 7.29| 0.61|
666+
|1| 128| FP32| 3.88| 0.12| 4.04| 4.09| 4.19| 38,964| 1.00| 591| 6.86| 1.77|
667+
|4| 128| FP32| 14.34| 0.42| 14.89| 15.08| 15.55| 43,489| 1.00| 609| 7.07| 0.49|
661668

662-
|Batch size|Input length|Precision|Avg latency (s)|Latency std (s)|Latency confidence interval 50% (s)|Latency confidence interval 100% (s)|Throughput (samples/sec)|Speed-up with mixed precision|Avg mels generated (81 mels=1 sec of speech)|Avg audio length (s)|Avg RTF|
663-
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
664-
|1| 128| FP16| 3.16| 0.13| 3.16| 3.81| 48,792| 1.23| 603| 7.00| 2.21|
665-
|4| 128| FP16| 11.45| 0.49| 11.39| 14.38| 53,771| 1.22| 601| 6.98| 0.61|
666-
|1| 128| FP32| 3.82| 0.11| 3.81| 4.24| 39,603| 1.00| 591| 6.86| 1.80|
667-
|4| 128| FP32| 13.80| 0.45| 13.74| 16.09| 43,915| 1.00| 592| 6.87| 0.50|
668669

669670
Our results were obtained by running the `./run_latency_tests.sh` script in
670-
the PyTorch-19.06-py3 NGC container. Please note that to reproduce the results,
671+
the PyTorch-19.09-py3 NGC container. Please note that to reproduce the results,
671672
you need to provide pretrained checkpoints for Tacotron 2 and WaveGlow. Please
672673
edit the script to provide your checkpoint filenames.
673674

@@ -696,7 +697,13 @@ August 2019
696697
September 2019
697698
* Introduced inference statistics
698699

700+
October 2019
701+
* Tacotron 2 inference with torch.jit.script
702+
703+
November 2019
704+
* Implemented training resume from checkpoint
705+
* Added notebook for running Tacotron 2 and WaveGlow in TRTIS.
706+
699707
### Known issues
700708

701709
There are no known issues in this release.
702-

PyTorch/SpeechSynthesis/Tacotron2/common/utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,9 @@
3333

3434
def get_mask_from_lengths(lengths):
3535
max_len = torch.max(lengths).item()
36-
ids = torch.arange(0, max_len, out=torch.cuda.LongTensor(max_len))
36+
ids = torch.arange(0, max_len, device=lengths.device, dtype=lengths.dtype)
3737
mask = (ids < lengths.unsqueeze(1)).byte()
38+
mask = torch.le(mask, 0)
3839
return mask
3940

4041

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# *****************************************************************************
2+
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of the NVIDIA CORPORATION nor the
12+
# names of its contributors may be used to endorse or promote products
13+
# derived from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
16+
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17+
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18+
# DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY
19+
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
20+
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21+
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
22+
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
23+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
24+
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25+
#
26+
# *****************************************************************************
27+
28+
import torch
29+
import argparse
30+
from inference import checkpoint_from_distributed, unwrap_distributed, load_and_setup_model
31+
from dllogger.autologging import log_hardware, log_args
32+
33+
def parse_args(parser):
34+
"""
35+
Parse commandline arguments.
36+
"""
37+
parser.add_argument('--tacotron2', type=str, required=True,
38+
help='full path to the Tacotron2 model checkpoint file')
39+
40+
parser.add_argument('-o', '--output', type=str, default="trtis_repo/tacotron/1/model.pt",
41+
help='filename for the Tacotron 2 TorchScript model')
42+
parser.add_argument('--amp-run', action='store_true',
43+
help='inference with AMP')
44+
45+
return parser
46+
47+
48+
def main():
49+
50+
parser = argparse.ArgumentParser(
51+
description='PyTorch Tacotron 2 Inference')
52+
parser = parse_args(parser)
53+
args = parser.parse_args()
54+
55+
log_args(args)
56+
tacotron2 = load_and_setup_model('Tacotron2', parser, args.tacotron2,
57+
args.amp_run, rename=True)
58+
59+
jitted_tacotron2 = torch.jit.script(tacotron2)
60+
61+
torch.jit.save(jitted_tacotron2, args.output)
62+
63+
64+
if __name__ == '__main__':
65+
main()
66+
67+
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# *****************************************************************************
2+
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of the NVIDIA CORPORATION nor the
12+
# names of its contributors may be used to endorse or promote products
13+
# derived from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
16+
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17+
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18+
# DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY
19+
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
20+
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21+
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
22+
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
23+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
24+
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25+
#
26+
# *****************************************************************************
27+
28+
29+
import os
30+
import argparse
31+
from dllogger.autologging import log_hardware, log_args
32+
33+
34+
def parse_args(parser):
35+
"""
36+
Parse commandline arguments.
37+
"""
38+
parser.add_argument("--trtis_model_name",
39+
type=str,
40+
default='tacotron2',
41+
help="exports to appropriate directory for TRTIS")
42+
parser.add_argument("--trtis_model_version",
43+
type=int,
44+
default=1,
45+
help="exports to appropriate directory for TRTIS")
46+
parser.add_argument("--trtis_max_batch_size",
47+
type=int,
48+
default=8,
49+
help="Specifies the 'max_batch_size' in the TRTIS model config.\
50+
See the TRTIS documentation for more info.")
51+
parser.add_argument('--amp-run', action='store_true',
52+
help='inference with AMP')
53+
return parser
54+
55+
56+
def main():
57+
parser = argparse.ArgumentParser(
58+
description='PyTorch Tacotron 2 TRTIS config exporter')
59+
parser = parse_args(parser)
60+
args = parser.parse_args()
61+
62+
log_args(args)
63+
64+
# prepare repository
65+
model_folder = os.path.join('./trtis_repo', args.trtis_model_name)
66+
version_folder = os.path.join(model_folder, str(args.trtis_model_version))
67+
if not os.path.exists(version_folder):
68+
os.makedirs(version_folder)
69+
70+
# build the config for TRTIS
71+
config_filename = os.path.join(model_folder, "config.pbtxt")
72+
config_template = r"""
73+
name: "{model_name}"
74+
platform: "pytorch_libtorch"
75+
max_batch_size: {max_batch_size}
76+
input [
77+
{{
78+
name: "sequence__0"
79+
data_type: TYPE_INT64
80+
dims: [-1]
81+
}},
82+
{{
83+
name: "input_lengths__1"
84+
data_type: TYPE_INT64
85+
dims: [1]
86+
reshape: {{ shape: [ ] }}
87+
}}
88+
]
89+
output [
90+
{{
91+
name: "mel_outputs_postnet__0"
92+
data_type: {fp_type}
93+
dims: [80,-1]
94+
}},
95+
{{
96+
name: "mel_lengths__1"
97+
data_type: TYPE_INT32
98+
dims: [1]
99+
reshape: {{ shape: [ ] }}
100+
}}
101+
]
102+
"""
103+
104+
config_values = {
105+
"model_name": args.trtis_model_name,
106+
"max_batch_size": args.trtis_max_batch_size,
107+
"fp_type": "TYPE_FP16" if args.amp_run else "TYPE_FP32"
108+
}
109+
110+
with open(model_folder + "/config.pbtxt", "w") as file:
111+
final_config_str = config_template.format_map(config_values)
112+
file.write(final_config_str)
113+
114+
115+
if __name__ == '__main__':
116+
main()
117+

0 commit comments

Comments
 (0)