Skip to content

SpeechBrain v0.5.12

Choose a tag to compare

@mravanelli mravanelli released this 26 Jun 20:19
· 3977 commits to develop since this release
fc15db4

Release Notes - SpeechBrain v0.5.12

We worked very hard and we are very happy to announce the new version of SpeechBrain!

SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.

The main changes are the following:

A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.

B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.

C) Speech Separation:

  1. We developed a novel version of the SepFormer called Resource-Efficient SepFormer (RE-Sepformer). The code is available here and the pre-trained model (with an easy inference interface) here.
  2. We released a recipe for Binaural speech separation with WSJMix. See the code here.
  3. We released a new recipe with the AIShell mix dataset. You can see the code here.

D) Speech Enhancement:

  1. We released the SepFormer model for speech enhancement. the code is here, while the pre-trained model (with easy-inference interface) is here.
  2. We implemented the WideResNet for speech enhancement and use it to mimic loss-based speech enhancement. The code is here and the pretrained model (with easy-inference interface) is here.

E) Feature Front-ends:

  1. We now support LEAF filter banks. The code is here. You can find an example of a recipe using it here.
  2. We now support SincConv multichannel (see code here).

F) Recipe Refactors:

  1. We refactored the Voxceleb recipe and fix the normalization issues. See the new code here. We also made the EER computation method less memory demanding (see here).
  2. We refactored the IEMOCAP recipe for emotion recognition. See the new code here.

G) Models for African Languages:
We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.

H) Profiler:
We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details.
A tutorial is available here.

I) Tests:
We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.

L) Other improvements:

  1. We now support the torchaudio RNNT loss*.
  2. We improved the relative attention mechanism of the Conformer.
  3. We updated the transformer for LibriSpeech. This improves the performance from WER= 2.46% to 2.26% on the test-clean. See the code here.
  4. The Environmental corruption module can now support different sampling rates.
  5. Minor fixes.