Skip to content

Commit bb7d888

Browse files
fpaissanasumagicpoonehmousavimravanellishucongzhang
authored
Fix Checks (speechbrain#8)
* Skip lazy imports when the caller is inspect.py This avoids having certain inspect functions import our lazy modules when we don't want them to. `getframeinfo` in particular appears to do it, and this gets called by PyTorch at some point. IPython might also be doing it but autocomplete still seems to work. This does not appear to break anything. Added test for hyperpyyaml to ensure we're not breaking that. * SSL_Semantic_Token _ new PR (speechbrain#2509) * remove unnecassry files and move to dasb * remove extra recepie from test * update ljspeech qunatization recepie * add discrete_ssl and remove extra files * fix precommit * update kmeans and add tokeizer for postprocessing * fix precommit * Update discrete_ssl.py * fix clone warning --------- Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com> * _ensure_module Raises docstring * Expose `ensure_module` so that docs get generated for it This is already an internal class anyway, and this is safe to call. * Update actions/setup-python * Use `uv` in test CI + merge some dep installs The consequence is faster dependency installation. Merging some of the dependency installs helps avoid some packages being reinstalled from one line to the next. Additionally, CPU versions are specified when relevant, to avoid downloading CUDA stuff the CI can't use anyway. * Use `uv` in doc CI + merge some dep installs Similar rationale as for the test CI * Parallelize doc generation with Sphinx This does not affect the entire doc generation process but should allow some minor multithreading even with the 2-core CI workers. * Enable `uv` caching on the test CI * Enable `uv` caching on the docs CI * CTC-only training recipes for LibriSpeech (code from Samsung AI Cambridge) (speechbrain#2290) CTC-only pre-training of conformer and branchformer. --------- Co-authored-by: Shucong Zhang/Embedded AI /SRUK/Engineer/Samsung Electronics <s1.zhang@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> Co-authored-by: Adel Moumen <88119391+Adel-Moumen@users.noreply.github.com> Co-authored-by: Parcollet Titouan <titouan.parcollet@univ-avignon.fr> * Update CommonVoice transformer recipes (code from Samsung AI Center Cambridge) (speechbrain#2465) * Update CV transformer recipes to match latest results with conformer. --------- Co-authored-by: Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics <t.parcollet@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com> Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> * Whisper improvements: flash attention, KV caching, lang_id, translation, training... (speechbrain#2450) Whisper improvements: - flash attention - kv caching - lang identifaction - translation - finetuning amelioration ... and more ... * Update README.md * precommit * update zed download link (speechbrain#2514) * `RelPosEncXL` refactor and precision fixes (speechbrain#2498) * Add `RelPosEncXL.make_pe`, rework precision handling * Rework RelPosEncXL output dtype selection * Fix in-place input normalization when using `sentence`/`speaker` norm (speechbrain#2504) * fix LOCAL_RANK to be RANK in if_main_process (speechbrain#2506) * Fix Separation and Enhancement recipes behavior when NaN encountered (speechbrain#2524) * Fix Separation and Enhancement recipes behavior when NaN encountered * Formatting using precommit hooks * Lock torch version in requirements.txt (speechbrain#2528) * Fix compatibility for torchaudio versions without `.io` (speechbrain#2532) This avoids having the Python interpreter attempt to resolve the type annotation directly. * fix docstrings * consistency tests - classification * consistency tests - classification * consistency tests - interpret * default to no wham * fix after tests pass * fix after tests pass * tests after that * fix consistency --------- Co-authored-by: asu <sdelang@sdelang.fr> Co-authored-by: Pooneh Mousavi <moosavi.pooneh@gmail.com> Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com> Co-authored-by: shucongzhang <104781888+shucongzhang@users.noreply.github.com> Co-authored-by: Shucong Zhang/Embedded AI /SRUK/Engineer/Samsung Electronics <s1.zhang@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> Co-authored-by: Adel Moumen <88119391+Adel-Moumen@users.noreply.github.com> Co-authored-by: Parcollet Titouan <titouan.parcollet@univ-avignon.fr> Co-authored-by: Parcollet Titouan <parcollet.titouan@gmail.com> Co-authored-by: Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics <t.parcollet@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Yingzhi WANG <41187612+BenoitWang@users.noreply.github.com> Co-authored-by: Peter Plantinga <plantinga.peter@protonmail.com> Co-authored-by: Séverin <123748182+SevKod@users.noreply.github.com>
1 parent a7cc35d commit bb7d888

103 files changed

Lines changed: 3730 additions & 4355 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/pythonapp.yml

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,21 @@ jobs:
1717
python-version: [3.8, 3.12]
1818
steps:
1919
- uses: actions/checkout@v2
20+
- uses: actions/cache@v4
21+
id: cache-uv
22+
with:
23+
path: ~/.cache/uv
24+
key: ${{ runner.os }}-python-${{ matrix.python-version }}-uv
2025
- name: Set up Python ${{ matrix.python-version }}
21-
uses: actions/setup-python@v1
26+
uses: actions/setup-python@v5
2227
with:
2328
python-version: ${{ matrix.python-version }}
29+
- name: Full dependencies
30+
run: |
31+
pip install uv
32+
uv pip install --system ctc-segmentation # ctc-segmentation is funky with uv due to their oldest-supported-numpy dependency
33+
uv pip install --system -r requirements.txt torch==2.2.1+cpu torchaudio==2.2.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu k2==1.24.4.dev20240223+cpu.torch2.2.1 --find-links https://k2-fsa.github.io/k2/cpu.html kaldilm==1.15.1 spacy==3.7.4 flair==0.13.1
34+
uv pip install --system --editable . --no-deps # already installed pinned deps from requirements.txt, we're good
2435
- name: Install sox
2536
run: |
2637
sudo apt-get update
@@ -33,19 +44,6 @@ jobs:
3344
# sudo apt-get install -y ffmpeg
3445
- name: Display Python version
3546
run: python -c "import sys; print(sys.version)"
36-
- name: Full dependencies
37-
run: |
38-
sudo apt-get update
39-
# up to k2 compatible torch version
40-
pip install torch==2.2.1 torchaudio==2.2.1
41-
pip install -r requirements.txt
42-
pip install --editable .
43-
pip install ctc-segmentation
44-
pip install k2==1.24.4.dev20240223+cpu.torch2.2.1 -f https://k2-fsa.github.io/k2/cpu.html
45-
pip install protobuf
46-
pip install kaldilm==1.15.1
47-
pip install spacy==3.7.4
48-
pip install flair==0.13.1
4947
- name: Consistency tests with pytest
5048
run: |
5149
pytest tests/consistency

.github/workflows/verify-docs-gen.yml

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,22 @@ jobs:
1111
runs-on: ubuntu-latest
1212
steps:
1313
- uses: actions/checkout@v2
14+
- uses: actions/cache@v4
15+
id: cache-uv
16+
with:
17+
path: ~/.cache/uv
18+
key: ${{ runner.os }}-python-docs-uv
1419
- name: Setup Python 3.8
15-
uses: actions/setup-python@v2
20+
uses: actions/setup-python@v5
1621
with:
1722
python-version: '3.8'
1823
- name: Full dependencies
1924
run: |
20-
# up to k2 compatible torch version
21-
pip install torch==2.1.2 torchaudio==2.1.2
22-
pip install -r requirements.txt
23-
pip install --editable .
24-
pip install -r docs/docs-requirements.txt
25-
pip install k2==1.24.4.dev20231220+cpu.torch2.1.2 -f https://k2-fsa.github.io/k2/cpu.html
25+
pip install uv
26+
uv pip install --system ctc-segmentation # ctc-segmentation is funky with uv due to their oldest-supported-numpy dependency
27+
uv pip install --system -r requirements.txt -r docs/docs-requirements.txt torch==2.2.1+cpu torchaudio==2.2.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu k2==1.24.4.dev20240223+cpu.torch2.2.1 --find-links https://k2-fsa.github.io/k2/cpu.html kaldilm==1.15.1 spacy==3.7.4 flair==0.13.1
28+
uv pip install --system --editable . --no-deps # already installed pinned deps from requirements.txt, we're good
2629
- name: Generate docs
2730
run: |
2831
cd docs
29-
make html
32+
SPHINXOPTS="-j=auto" make html

conftest.py

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,7 @@ def pytest_generate_tests(metafunc):
4040
except ModuleNotFoundError:
4141
collect_ignore.append("speechbrain/utils/kmeans.py")
4242
collect_ignore.append(
43-
"speechbrain/lobes/models/huggingface_transformers/discrete_hubert.py"
44-
)
45-
collect_ignore.append(
46-
"speechbrain/lobes/models/huggingface_transformers/discrete_wav2vec2.py"
47-
)
48-
collect_ignore.append(
49-
"speechbrain/lobes/models/huggingface_transformers/discrete_wavlm.py"
43+
"speechbrain/lobes/models/huggingface_transformers/discrete_ssl.py"
5044
)
5145
try:
5246
import peft # noqa: F401

recipes/Aishell1Mix/separation/train.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ def fit_batch(self, batch):
165165
self.nonfinite_count
166166
)
167167
)
168-
loss.data = torch.tensor(0).to(self.device)
168+
loss.data = torch.tensor(0.0).to(self.device)
169169
else:
170170
predictions, targets = self.compute_forward(
171171
mixture, targets, sb.Stage.TRAIN, noise
@@ -197,7 +197,7 @@ def fit_batch(self, batch):
197197
self.nonfinite_count
198198
)
199199
)
200-
loss.data = torch.tensor(0).to(self.device)
200+
loss.data = torch.tensor(0.0).to(self.device)
201201
self.optimizer.zero_grad()
202202

203203
return loss.detach().cpu()

recipes/BinauralWSJ0Mix/separation/train.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,7 @@ def fit_batch(self, batch):
253253
self.nonfinite_count
254254
)
255255
)
256-
loss.data = torch.tensor(0).to(self.device)
256+
loss.data = torch.tensor(0.0).to(self.device)
257257
else:
258258
predictions, targets = self.compute_forward(
259259
mixture, targets, sb.Stage.TRAIN, noise
@@ -285,7 +285,7 @@ def fit_batch(self, batch):
285285
self.nonfinite_count
286286
)
287287
)
288-
loss.data = torch.tensor(0).to(self.device)
288+
loss.data = torch.tensor(0.0).to(self.device)
289289
self.optimizer.zero_grad()
290290

291291
return loss.detach().cpu()

recipes/CVSS/S2ST/hparams/train_fr-en.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
203203
train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
204204
save_file: !ref <train_log>
205205

206-
valid_search: !new:speechbrain.decoders.seq2seq.S2STransformerGreedySearch
206+
valid_search: !new:speechbrain.decoders.seq2seq.S2STransformerGreedySearcher
207207
modules: [!ref <transformer>, !ref <seq_lin>, null]
208208
bos_index: !ref <bos_index>
209209
eos_index: !ref <eos_index>

recipes/CommonVoice/ASR/transformer/README.md

Lines changed: 28 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,8 @@ It is important to note that CommonVoice initially offers mp3 audio files at 42H
2121
# Languages
2222
Here is a list of the different languages that we tested within the CommonVoice dataset
2323
with our transformers:
24-
- French
2524
- Italian
26-
- German
25+
- French
2726

2827
For Whisper-large-v2 and medium finetuning, here is list of the different language that we tested within the CommonVoice.14_0 dataset:
2928
- Hindi
@@ -36,30 +35,35 @@ For Whisper-large-v2 and medium finetuning, here is list of the different langua
3635

3736

3837
# Results
39-
40-
| Language | Release | hyperparams file | LM | Val. CER | Val. WER | Test CER | Test WER | Hugging Face link | Model link | GPUs |
38+
## Transformer
39+
| Language | CV version | hyperparams file | LM | Val. CER | Val. WER | Test CER | Test WER | Hugging Face link | Model link | GPUs |
4140
| ------------- |:-------------:|:---------------------------:| -----:| -----:| -----:| -----:| -----:|:-----------:| :-----------:| :-----------:|
42-
| French | 2023-08-15 | train_fr.yaml | No | 5.41 | 16.00 | 5.41 | 17.61 | - | [model](https://www.dropbox.com/sh/zvu9h9pctksnuvp/AAD1kyS3-N0YtmcoMgjM-_Tba?dl=0) | 1xV100 32GB |
43-
| Italian | 2023-08-15 | train_it.yaml | No | 3.72 | 16.31 | 4.01 | 16.80 | - | [model](https://www.dropbox.com/sh/yy8du12jgbkm3qe/AACBHhTCM-cU-oGvAKJ9kTtaa?dl=0) | 1xV100 32GB |
44-
| German | 2023-08-15 | train_de.yaml | No | 3.60 | 15.33 | 4.22 | 16.76 |- | [model](https://www.dropbox.com/sh/umfq986o3d9o1px/AAARNF2BFYELOWx3xhIOEoZka?dl=0) | 1xV100 32GB |
41+
| Italian | 14.0 | conformer_large.yaml | No | 2.91 | 9.79 | 2.68 | 9.27 | - | [model](https://www.dropbox.com/scl/fo/tf44itp8f4icf2z5qlxpm/AIOYS_CMov5ss5Q9AonFEno?rlkey=xek5ikbhqoovcao31iniqimrr&dl=0) | 2xV100 32GB |
42+
| French | 14.0 | conformer_large.yaml | No | 2.64 | 7.62 | 3.55 | 9.48 | - | [model](https://www.dropbox.com/scl/fo/y862nl95zoe4sj3347095/ACxmT3_uw1ScLoYs0DSbGRM?rlkey=q66dk13w5nu1lkphtdinnnigm&dl=0) | 2xV100 32GB |
43+
4544

46-
## Whisper Finetuning Result:
47-
Following table contains whisper-finetuning results for 1 epoch using whisper_medium model, freezing encoder and finetuning decoder.
48-
| Language | Release | Model | hyperparams file | LM | Val. CER | Val. WER | Test CER | Test WER | HuggingFace link | Model link | GPUs |
49-
| ------------- |:-------------:| -----:|:---------------------------:| -----:| -----:| -----:| -----:| -----:| :-----------: |:-----------:| :-----------:|
50-
| Arabic | 2023-08-15 | large-v2 | train_ar_hf_whisper.yaml | No | 4.02 | 12.47 | 5.20 | 16.96 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-ar) | [model](https://www.dropbox.com/sh/45o3xkxdheksdfi/AAAs1zxCw76mcAbudYEonzg0a?dl=0) | 1xV100 16GB |
51-
| Persian | 2023-08-15 | large-v2 | train_fa_hf_whisper.yaml | No | 6.91 | 25.30 | 9.38 | 31.75 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-fa) | [model](https://www.dropbox.com/sh/a2vd6nn0icybdcz/AAC7z41jcheW1R9aNNK4-lHha?dl=0) | 1xV100 16GB |
52-
| Mongolian | 2023-08-15 | large-v2 | train_mn_hf_whisper.yaml | No | 24.05 | 62.37 | 25.73 | 64.92 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-mn) | [model](https://www.dropbox.com/sh/2t0srpb2nt2wst5/AACRJQCwooRaLxPoLkmTvKq8a?dl=0) | 1xV100 16GB |
53-
| Hindi | 2023-08-15 | large-v2 | train_hi_hf_whisper.yaml | No | 4.54 | 10.46 | 7.00 | 15.27 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-hi) | [model](https://www.dropbox.com/sh/qkcm86bzzb1y4sj/AABjA_ckw_hPwJCBzUiXLWrBa?dl=0) | 1xV100 16GB |
54-
| Serbian | 2023-08-15 | large-v2 | train_sr_hf_whisper.yaml | No | 8.92 | 27.12 | 7.60 | 23.63 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-sr) | [model](https://www.dropbox.com/sh/a798gw3k2ezerp5/AADz7UxvQRQDOH4DnCJ4J4dja?dl=0) | 1xV100 16GB |
55-
| French | 2023-08-15 | large-v2 | train_fr_hf_whisper.yaml | No | 3.00 | 8.95 | 3.83 | 10.62 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-fr) | [model](https://www.dropbox.com/sh/8c2lpa7m5amasjz/AAD5AZlD6OslhFc0W81D3nosa?dl=0) | 1xV100 16GB |
56-
| Arabic | 2023-08-15 | Medium | train_ar_hf_whisper.yaml | No | 4.95 | 14.82 | 6.51 | 20.24 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-ar) | [model](https://www.dropbox.com/sh/0e4vtvbg6hf2e13/AAD-tfzCZGUrh85aeAeJj8I9a?dl=0) | 1xV100 16GB |
57-
| Persian | 2023-08-15 | Medium | train_fa_hf_whisper.yaml | No | 8.58 | 35.48 | 11.27 | 35.48 |[model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-fa) | [model](https://www.dropbox.com/sh/w1urihacmtoulmi/AADMtK3qeAF5mLYk5LMHyiOra?dl=0) | 1xV100 16GB |
58-
| Mongolian | 2023-08-15 | Medium | train_mn_hf_whisper.yaml | No | 27.08 | 67.41 | 27.69 | 67.84 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-mn) | [model](https://www.dropbox.com/sh/6fbhmey7q1udykf/AAAiGObWTTe2cdXHt2Uv2VQXa?dl=0) | 1xV100 16GB |
59-
| Hindi | 2023-08-15 | Medium | train_hi_hf_whisper.yaml | No | 5.82 | 12.51 | 8.16 | 17.04 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-hi) | [model](https://www.dropbox.com/sh/z9vriyy3i6xqvif/AAB7ql-40yWTjKEQJiuhYUr5a?dl=0) | 1xV100 16GB |
60-
| Serbian | 2023-08-15 | Medium | train_sr_hf_whisper.yaml | No | 8.63 | 25.10 | 7.25 | 22.29 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-sr) | [model](https://www.dropbox.com/sh/5lhk230q45sd97z/AAD-U9b_Ws_vFPs-cazsbOY0a?dl=0) | 1xV100 16GB |
61-
| French | 2023-08-15 | Medium | train_fr_hf_whisper.yaml | No | 3.26 | 9.65 | 4.30 | 11.79 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-fr) | [model](https://www.dropbox.com/sh/7zlk07yxnslk4yy/AAANcI3EaG0ZFy6UrKk1Mm2Ga?dl=0) | 1xV100 16GB |
62-
| Italian | 2023-08-15 | Medium | train_it_hf_whisper.yaml | No | 2.42 | 8.26 | 3.03 | 9.63 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-it) | [model](https://www.dropbox.com/sh/u5tex3nvzzs5pex/AAD-J7cOBE_fNfBono8waTKCa?dl=0) | 1xV100 16GB |
45+
## Whisper Finetuning
46+
Following table contains whisper-finetuning results for 1 epoch using Whisper model, freezing encoder and finetuning decoder.
47+
| Language | Release | Model | commit hash | hyperparams file | LM | Val. CER | Val. WER | Test CER | Test WER | HuggingFace link | Model link | GPUs |
48+
| ------------- |:-------------:| -----:|-----:|:---------------------------:| -----:| -----:| -----:| -----:| -----:| :-----------: |:-----------:| :-----------:|
49+
| French | 2024-03-28 | large-v3 | [e4e2e13](https://github.com/speechbrain/speechbrain/pull/2450/commits/e4e2e135e9edafc6a26fc9aa4df9a94eaf86de41) | train_hf_whisper.yaml | No | 2.31% | 7.38% | 3.11% | 9.09% | x | [DropBox](https://www.dropbox.com/scl/fo/erwh83bg2jbzf3bf8v6ur/AHmQ5i8uWRaieXCOe5DSRUk?rlkey=kjivz2hx3o1pi7wbzadjznpid&dl=0) | 2xV100 32GB |
50+
| Italian | 2024-03-28 | large-v3 | [e4e2e13](https://github.com/speechbrain/speechbrain/pull/2450/commits/e4e2e135e9edafc6a26fc9aa4df9a94eaf86de41) | train_hf_whisper.yaml | No | 1.27% | 4.85% | 1.62% | 5.47% | x | [DropBox](https://www.dropbox.com/scl/fo/gtfo3qoz1ceg4xg0dfq1d/AIabz2J9NxkNAEbGF7rHCHU?rlkey=eokq2a2z07ke48scazqnn5v73&dl=0) | 2xV100 32GB |
51+
| French | 2024-03-28 | medium | [e4e2e13](https://github.com/speechbrain/speechbrain/pull/2450/commits/e4e2e135e9edafc6a26fc9aa4df9a94eaf86de41) | train_hf_whisper.yaml | No | 2.92% | 8.90% | 4.02% | 11.07% | x | [DropBox](https://www.dropbox.com/scl/fo/72aiaflc9w6168rk9jv6u/AGIVW5ml74wZYED7HUFjX-U?rlkey=nz7eo6i6gbze7rwv8la6sxobx&dl=0) | 2xV100 32GB |
52+
| Italian | 2024-03-28 | medium | [e4e2e13](https://github.com/speechbrain/speechbrain/pull/2450/commits/e4e2e135e9edafc6a26fc9aa4df9a94eaf86de41) | train_hf_whisper.yaml | No | 2.05% | 7.17% | 2.31% | 7.79% | x | [DropBox](https://www.dropbox.com/scl/fo/sso9k4n6hma9cub44oi2p/AKINkGK0XMCYND-JrMQh4LQ?rlkey=gywsgxle4k473z9c7tf4l1m7n&dl=0) | 2xV100 32GB |
53+
| French | 2024-03-28 | small | [e4e2e13](https://github.com/speechbrain/speechbrain/pull/2450/commits/e4e2e135e9edafc6a26fc9aa4df9a94eaf86de41) | train_hf_whisper.yaml | No | 4.34% | 12.57% | 5.89% | 15.46% | x | [DropBox](https://www.dropbox.com/scl/fo/h8idsgzp8xz5vsupqv0q8/ACS13H9awYU2G7DeTcyxiV0?rlkey=bbqpx0lbf5aify6ib029g2gn0&dl=0) | 2xV100 32GB |
54+
| Italian | 2024-03-28 | small | [e4e2e13](https://github.com/speechbrain/speechbrain/pull/2450/commits/e4e2e135e9edafc6a26fc9aa4df9a94eaf86de41) | train_hf_whisper.yaml | No | 3.20% | 11.40% | 3.71% | 12.25% | x | [DropBox](https://www.dropbox.com/scl/fo/o4objjm5c65c5hzy1vvk4/ABXA2V1Gy1GCg7FGS6Ty9yc?rlkey=4kbjmmljdznvureyxfip5tw8q&dl=0) | 2xV100 32GB |
55+
| Arabic | 2023-08-15 | large-v2 | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) | train_ar_hf_whisper.yaml | No | 4.02 | 12.47 | 5.20 | 16.96 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-ar) | [model](https://www.dropbox.com/sh/45o3xkxdheksdfi/AAAs1zxCw76mcAbudYEonzg0a?dl=0) | 1xV100 16GB |
56+
| Persian | 2023-08-15 | large-v2 | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_fa_hf_whisper.yaml | No | 6.91 | 25.30 | 9.38 | 31.75 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-fa) | [model](https://www.dropbox.com/sh/a2vd6nn0icybdcz/AAC7z41jcheW1R9aNNK4-lHha?dl=0) | 1xV100 16GB |
57+
| Mongolian | 2023-08-15 | large-v2 | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_mn_hf_whisper.yaml | No | 24.05 | 62.37 | 25.73 | 64.92 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-mn) | [model](https://www.dropbox.com/sh/2t0srpb2nt2wst5/AACRJQCwooRaLxPoLkmTvKq8a?dl=0) | 1xV100 16GB |
58+
| Hindi | 2023-08-15 | large-v2 | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_hi_hf_whisper.yaml | No | 4.54 | 10.46 | 7.00 | 15.27 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-hi) | [model](https://www.dropbox.com/sh/qkcm86bzzb1y4sj/AABjA_ckw_hPwJCBzUiXLWrBa?dl=0) | 1xV100 16GB |
59+
| Serbian | 2023-08-15 | large-v2 | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_sr_hf_whisper.yaml | No | 8.92 | 27.12 | 7.60 | 23.63 | [model](https://huggingface.co/speechbrain/asr-whisper-large-v2-commonvoice-sr) | [model](https://www.dropbox.com/sh/a798gw3k2ezerp5/AADz7UxvQRQDOH4DnCJ4J4dja?dl=0) | 1xV100 16GB |
60+
| Arabic | 2023-08-15 | Medium | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_ar_hf_whisper.yaml | No | 4.95 | 14.82 | 6.51 | 20.24 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-ar) | [model](https://www.dropbox.com/sh/0e4vtvbg6hf2e13/AAD-tfzCZGUrh85aeAeJj8I9a?dl=0) | 1xV100 16GB |
61+
| Persian | 2023-08-15 | Medium | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_fa_hf_whisper.yaml | No | 8.58 | 35.48 | 11.27 | 35.48 |[model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-fa) | [model](https://www.dropbox.com/sh/w1urihacmtoulmi/AADMtK3qeAF5mLYk5LMHyiOra?dl=0) | 1xV100 16GB |
62+
| Mongolian | 2023-08-15 | Medium | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_mn_hf_whisper.yaml | No | 27.08 | 67.41 | 27.69 | 67.84 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-mn) | [model](https://www.dropbox.com/sh/6fbhmey7q1udykf/AAAiGObWTTe2cdXHt2Uv2VQXa?dl=0) | 1xV100 16GB |
63+
| Hindi | 2023-08-15 | Medium | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_hi_hf_whisper.yaml | No | 5.82 | 12.51 | 8.16 | 17.04 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-hi) | [model](https://www.dropbox.com/sh/z9vriyy3i6xqvif/AAB7ql-40yWTjKEQJiuhYUr5a?dl=0) | 1xV100 16GB |
64+
| Serbian | 2023-08-15 | Medium | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_sr_hf_whisper.yaml | No | 8.63 | 25.10 | 7.25 | 22.29 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-sr) | [model](https://www.dropbox.com/sh/5lhk230q45sd97z/AAD-U9b_Ws_vFPs-cazsbOY0a?dl=0) | 1xV100 16GB |
65+
| French | 2023-08-15 | Medium | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/commits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_fr_hf_whisper.yaml | No | 3.26 | 9.65 | 4.30 | 11.79 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-fr) | [model](https://www.dropbox.com/sh/7zlk07yxnslk4yy/AAANcI3EaG0ZFy6UrKk1Mm2Ga?dl=0) | 1xV100 16GB |
66+
| Italian | 2023-08-15 | Medium | [b112860](https://github.com/speechbrain/speechbrain/pull/2254/mcommits/b1128604e040d43e80e9a3214c5116f34d5806db) |train_it_hf_whisper.yaml | No | 2.42 | 8.26 | 3.03 | 9.63 | [model](https://huggingface.co/speechbrain/asr-whisper-medium-commonvoice-it) | [model](https://www.dropbox.com/sh/u5tex3nvzzs5pex/AAD-J7cOBE_fNfBono8waTKCa?dl=0) | 1xV100 16GB |
6367

6468
# **About SpeechBrain**
6569
- Website: https://speechbrain.github.io/

0 commit comments

Comments
 (0)