SpeechBrain v0.5.12
Release Notes - SpeechBrain v0.5.12
We worked very hard and we are very happy to announce the new version of SpeechBrain!
SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.
The main changes are the following:
A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.
B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.
C) Speech Separation:
- We developed a novel version of the SepFormer called Resource-Efficient SepFormer (RE-Sepformer). The code is available here and the pre-trained model (with an easy inference interface) here.
- We released a recipe for Binaural speech separation with WSJMix. See the code here.
- We released a new recipe with the AIShell mix dataset. You can see the code here.
D) Speech Enhancement:
- We released the SepFormer model for speech enhancement. the code is here, while the pre-trained model (with easy-inference interface) is here.
- We implemented the WideResNet for speech enhancement and use it to mimic loss-based speech enhancement. The code is here and the pretrained model (with easy-inference interface) is here.
E) Feature Front-ends:
- We now support LEAF filter banks. The code is here. You can find an example of a recipe using it here.
- We now support SincConv multichannel (see code here).
F) Recipe Refactors:
- We refactored the Voxceleb recipe and fix the normalization issues. See the new code here. We also made the EER computation method less memory demanding (see here).
- We refactored the IEMOCAP recipe for emotion recognition. See the new code here.
G) Models for African Languages:
We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.
H) Profiler:
We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details.
A tutorial is available here.
I) Tests:
We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.
L) Other improvements:
- We now support the torchaudio RNNT loss*.
- We improved the relative attention mechanism of the Conformer.
- We updated the transformer for LibriSpeech. This improves the performance from WER= 2.46% to 2.26% on the test-clean. See the code here.
- The Environmental corruption module can now support different sampling rates.
- Minor fixes.