You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-9Lines changed: 10 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -72,7 +72,7 @@ SpeechBrain provides different models for speaker recognition, identification, a
72
72
- Libraries to extract speaker embeddings with a pre-trained model on your data.
73
73
74
74
### Text-to-Speech (TTS) and Vocoders
75
-
- Recipes for training TTS systems such as [Tacotron2](https://github.com/speechbrain/speechbrain/tree/develop/recipes/LJSpeech) with LJSpeech.
75
+
- Recipes for training TTS systems such as [Tacotron2](https://github.com/speechbrain/speechbrain/tree/develop/recipes/LJSpeech/) and [FastSpeech2](https://github.com/speechbrain/speechbrain/tree/develop/recipes/LJSpeech/) with LJSpeech.
76
76
- Recipes for training Vocoders such as [HiFIGAN](https://github.com/speechbrain/speechbrain/tree/develop/recipes/LJSpeech).
77
77
78
78
### Grapheme-to-Phoneme (G2P)
@@ -95,28 +95,29 @@ Combining multiple microphones is a powerful approach to achieving robustness in
95
95
- Speaker localization.
96
96
97
97
### Emotion Recognition
98
-
- Recipes for emotion recognition using SSL and ECAPA-TDNN models.
98
+
- Recipes for emotion recognition using SSL and ECAPA-TDNN models on the [IEMOCAP](https://sail.usc.edu/iemocap/iemocap_release.htm) dataset.
99
+
- Recipe for emotion diarization using SSL models on the [ZaionEmotionDataset](https://zaion.ai/en/resources/zaion-lab-blog/zaion-emotion-dataset/).
99
100
100
101
### Interpretability
101
102
- Recipes for various intepretability techniques on the ESC50 dataset.
102
103
103
104
### Spoken Language Understanding
104
-
- Recipes for training wav2vec 2.0 models with the[MEDIA](https://catalogue.elra.info/en-us/repository/browse/ELRA-E0024/)dataset.
105
+
- Recipes for training wav2vec 2.0 models on, [SLURP](https://zenodo.org/record/4274930#.YEFCYHVKg5k),[MEDIA](https://catalogue.elra.info/en-us/repository/browse/ELRA-E0024/)and [timers-and-such](https://zenodo.org/record/4623772#.YGeMMHVKg5k) datasets.
105
106
106
107
### Performance
107
108
The recipes released with speechbrain implement speech processing systems with competitive or state-of-the-art performance. In the following, we report the best performance achieved on some popular benchmarks:
Since SpeechBrain is a beta release research-oriented toolkit, it aims to support the latest major version (at x.y level, e.g. 0.5 until 0.6 is released) with security updates, but unfortunately cannot promise long-term security updates for old versions.
6
+
7
+
## Reporting a Vulnerability
8
+
9
+
Vulnerabilities may be reported confidentially to speechbrainproject@gmail.com
Compilation of your models in SpeechBrain can potentially improve their speed and reduce memory demand. SpeechBrain inherits the compilation methods supported by PyTorch, including the just-in-time compiler (JIT) and the `torch.compile` method introduced in PyTorch version >=2.0.
4
+
5
+
## Compile with `torch.compile`
6
+
The `torch.compile` feature was introduced with PyTorch version >=2.0 to gradually replace JIT. Although this feature is valuable, it is still in the beta phase, and improvements are ongoing. Please have a look at the [PyTorch documentation](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) for more information.
7
+
8
+
### How to use `torch.compile`
9
+
Compiling all modules in SpeechBrain is straightforward. You can enable compilation by using the `--compile` flag in the command line when running a training recipe. For example:
This will automatically compile all the modules declared in the YAML file under the `modules` section.
16
+
17
+
Note that you might need to configure additional compilation flags correctly (e.g., `--compile_mode`, `--compile_using_fullgraph`, `--compile_using_dynamic_shape_tracing`) to ensure successful model compilation or achieve the best performance. For a deeper understanding of their roles, refer to the documentation in the [PyTorch documentation](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html).
18
+
19
+
In some cases, you may want to compile only specific modules. To achieve this, add a list of the module keys you want to compile in the YAML file using `compile_module_keys`. For instance:
20
+
21
+
```yaml
22
+
compile_module_keys: [encoder, decoder]
23
+
```
24
+
25
+
This will compile only the encoder and decoder models, which should be declared in the YAML file before using the respective keys.
26
+
27
+
Remember to call the training script with the `--compile` flag.
28
+
29
+
**Note of caution**: Compiling a model can be a complex process and may take some time. Additionally, it may fail in certain cases. The speed-up achieved through compilation is highly dependent on the system and GPU being used. For example, higher-end GPUs like the A100 tend to yield better speed-ups, while you may not observe significant improvements with V100 GPUs. We support this feature with the hope that `torch.compile` will constantly improve over time.
30
+
31
+
## Compile with JIT
32
+
JIT was the first compilation method supported by PyTorch. It is important to note that JIT is expected to be replaced soon by `torch.compile`. Please have a look at the [PyTorch documentation](https://pytorch.org/docs/stable/jit.html) for more information.
33
+
34
+
### How to use JIT
35
+
To compile all modules in SpeechBrain using JIT, use the `--jit` flag in the command line when running a training recipe:
This will automatically compile all the modules declared in the YAML file under the `modules` section.
42
+
43
+
If you only want to compile specific modules, add a list of the module keys you want to compile in the YAML file using `jit_module_keys`. For example:
44
+
45
+
```yaml
46
+
jit_module_keys: [encoder, decoder]
47
+
```
48
+
This will compile only the encoder and decoder models, provided they are declared in the YAML file using the specified keys.
49
+
50
+
Remember to call the training script with the `--jit` flag.
51
+
52
+
**Note of caution**: JIT has specific requirements for supported syntax, and many popular Python syntaxes are not supported. Therefore, when designing a model with JIT in mind, ensure that it meets the necessary syntax requirements for successful compilation. Additionally, the speed-up achieved through JIT compilation varies depending on the model type. We found it most beneficial for custom RNNs, such as the Li-GRU used in SpeechBrain's TIMIT/ASR/CTC. Custom RNNs often require "for loops," which can be slow in Python. The compilation with JIT provides a significant speed-up in such cases.
Copy file name to clipboardExpand all lines: docs/experiment.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,14 @@ The YAML syntax offers an elegant way to specify the hyperparameters of a recipe
15
15
In SpeechBrain, the YAML file is not a plain list of parameters, but for each parameter, we specify the function (or class) that is using it.
16
16
This not only makes the specification of the parameters more transparent but also allows us to properly initialize all the entries by simply calling the `load_extended_yaml` (in `speechbrain.utils.data_utils`).
every user either by editing the yaml, or with an override (passed to
39
47
`load_extended_yaml`).
40
48
41
-
For more details on YAML and our extensions, please see our dedicated [tutorial](https://colab.research.google.com/drive/1Pg9by4b6-8QD2iC0U7Ic3Vxq4GEwEdDz?usp=sharing).
49
+
For more details on YAML and our extensions, please see our dedicated [tutorial](https://colab.research.google.com/drive/1Pg9by4b6-8QD2iC0U7Ic3Vxq4GEwEdDz).
42
50
43
51
## Running arguments
44
52
SpeechBrain defines a set of running arguments that can be set from the command line args (or within the YAML file).
@@ -50,7 +58,7 @@ SpeechBrain defines a set of running arguments that can be set from the command
50
58
-`distributed_backend`: default "nccl", options: `["nccl", "gloo", "mpi"]`, this backend will be used as a DDP communication protocol. See PyTorch documentation for more details.
51
59
- Additional runtime arguments are documented in the Brain class.
52
60
53
-
Please note that we provide a dedicated [tutorial](https://colab.research.google.com/drive/13pBUacPiotw1IvyffvGZ-HrtBr9T6l15?usp=sharing) to document the different multi-gpu training strategies:
61
+
Please note that we provide a dedicated [tutorial](https://colab.research.google.com/drive/13pBUacPiotw1IvyffvGZ-HrtBr9T6l15) to document the different multi-gpu training strategies:
54
62
55
63
You can also override parameters in YAML in this way:
| train_with_wav2vec.yaml | No | 5.06 | 4.52 | 1xRTX 8000 Ti 48GB |
26
26
27
-
You can checkout our results (models, training logs, etc,) [here](https://drive.google.com/drive/folders/1GTB5IzQPl57j-0I1IpmvKg722Ti4ahLz?usp=sharing)
27
+
You can checkout our results (models, training logs, etc,) [here](https://www.dropbox.com/sh/e4bth1bylk7c6h8/AADFq3cWzBBKxuDv09qjvUMta?dl=0)
0 commit comments