Skip to content

Commit 555de27

Browse files
committed
add hf, dropbox, recipes.csv
1 parent 9b9c6c2 commit 555de27

6 files changed

Lines changed: 28 additions & 25 deletions

File tree

recipes/RescueSpeech/ASR/noise-robust/hparams/robust_asr_16k.yaml

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,14 @@ save_folder: !ref <output_folder>/save
1212
train_log: !ref <output_folder>/train_log.txt
1313

1414
# URL for the biggest whisper model.
15-
model_version: !ref large-v2
16-
whisper_hub: !ref openai/whisper-<model_version>
17-
whisper_folder: !PLACEHOLDER
15+
whisper_hub: !ref openai/whisper-large-v2
16+
whisper_folder: !ref <save_folder>/whisper_checkpoint
1817
language: german
19-
pretrained_whisper_model: !PLACEHOLDER
2018

21-
# Path to pre-trained enhancement model
22-
pretrained_enhance_path: !PLACEHOLDER
19+
20+
# Path to pre-trained models
21+
pretrained_whisper_path: speechbrain/whisper_rescuespeech
22+
pretrained_enhance_path: speechbrain/sepformer_rescuespeech
2323

2424
epochs_before_lr_drop: 0
2525
unfreeze_epoch: !ref <epochs_before_lr_drop> + 1
@@ -30,14 +30,13 @@ test_only: False
3030

3131
# Dataset prep parameters
3232
data_folder: !PLACEHOLDER
33-
csv_dir: !ref csv_files
3433
train_tsv_file: !ref <data_folder>/train.tsv
3534
dev_tsv_file: !ref <data_folder>/dev.tsv
3635
test_tsv_file: !ref <data_folder>/test.tsv
3736
accented_letters: True
38-
train_csv: !ref <csv_dir>/train.csv
39-
valid_csv: !ref <csv_dir>/dev.csv
40-
test_csv: !ref <csv_dir>/test.csv
37+
train_csv: !ref <output_folder>/train.csv
38+
valid_csv: !ref <output_folder>/dev.csv
39+
test_csv: !ref <output_folder>/test.csv
4140
skip_prep: False
4241

4342
# We remove utterance slonger than 10s in the train/dev/test sets as
@@ -91,7 +90,7 @@ min_decode_ratio: 0.0
9190
max_decode_ratio: 1.0
9291
test_beam_size: 8
9392

94-
# Model parameters
93+
# Whisper model parameters
9594
freeze_whisper: False
9695
freeze_encoder_only: False
9796
freeze_encoder: True
@@ -171,7 +170,7 @@ lr_annealing_whisper: !new:speechbrain.nnet.schedulers.NewBobScheduler
171170
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
172171
limit: !ref <number_of_epochs>
173172

174-
# Enhanc loss
173+
# Enhance loss
175174
enhance_loss: !name:speechbrain.nnet.losses.get_si_snr_with_pitwrapper
176175

177176
# Change the path to use a local model instead of the remote one

recipes/RescueSpeech/ASR/noise-robust/train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -777,7 +777,7 @@ def text_pipeline(wrd):
777777
prepare_RescueSpeech,
778778
kwargs={
779779
"data_folder": hparams["data_folder"],
780-
"save_folder": hparams["csv_dir"],
780+
"save_folder": hparams["output_folder"],
781781
"train_tsv_file": hparams["train_tsv_file"],
782782
"dev_tsv_file": hparams["dev_tsv_file"],
783783
"test_tsv_file": hparams["test_tsv_file"],

recipes/RescueSpeech/README.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@ This recipe supports a simple combination of a speech enhancement model (**SepFo
1111

1212
## How to run
1313
```
14-
python train.py hparams/robust_asr_16k.yaml
14+
python train.py hparams/robust_asr_16k.yaml --data_folder=<data_folder_path>
1515
```
16+
Here the data path should be the path to uncompressed `Task_ASR.tar.gz` downloaded from link above.
1617

1718
## Results
1819
During training, both speech enhancement and ASR is kept unfrozen- i.e. both ASR and ehnance loss are backpropagated and weights are updated.
@@ -21,17 +22,15 @@ During training, both speech enhancement and ASR is kept unfrozen- i.e. both ASR
2122
|------ |--------|-------|-------|-------|---- |
2223
| Whisper (`large-v2`)| 7.334 | 7.871 | 2.085 | 0.857 | **24.20** |
2324

24-
## Pretrained Models
25-
We initially perform fine-tuning of both the ASR model and SepFormer model using the CommonVoice dataset and the Microsoft-DNS dataset. Subsequently, we proceed with a second stage of fine-tuning on our RescueSpeech dataset. Here you can find links to the trained models.
25+
The final models for nosie robust speech recognition can be found here: [HuggingFace](https://huggingface.co/sangeet2020/noisy-whisper-resucespeech) and [Dropbox](https://www.dropbox.com/sh/7tryj6n7cfy0poe/AADpl4b8rGRSnoQ5j6LCj9tua?dl=0)
2626

27+
## Fine-tuned models
28+
Initially, only the SepFormer model is trained on the Microsoft-DNS dataset. Then, we fine-tune both the Whisper ASR and SepFormer enhancement models using our RescueSpeech dataset. Here, you can access the links to these fine-tuned models.
2729

28-
| Dataset | CRDNN | Wav2vec2 | wavLM | Whisper |
29-
|----------------|------------------------------------------------|------------------------------------------------|------------------------------------------------|------------------------------------------------|
30-
| German <br> CommonVoice10.0 | [HuggingFace](link_commonvoice_crdnn_hf) | [HuggingFace](link_commonvoice_wav2vec2_hf) | [HuggingFace](link_commonvoice_wavlm_hf) | [HuggingFace](link_commonvoice_whisper_hf) |
31-
| | [Google Drive](link_commonvoice_crdnn_gd) | [Google Drive](link_commonvoice_wav2vec2_gd) | [Google Drive](link_commonvoice_wavlm_gd) | [Google Drive](link_commonvoice_whisper_gd) |
32-
| RescueSpeech | [HuggingFace](link_rescuespeech_crdnn_hf) | [HuggingFace](link_rescuespeech_wav2vec2_hf) | [HuggingFace](link_rescuespeech_wavlm_hf) | [HuggingFace](link_rescuespeech_whisper_hf) |
33-
| | [Google Drive](link_rescuespeech_crdnn_gd) | [Google Drive](link_rescuespeech_wav2vec2_gd) | [Google Drive](link_rescuespeech_wavlm_gd) | [Google Drive](link_rescuespeech_whisper_gd) |
34-
30+
| Model | HuggingFace link | Full Model link |
31+
|----------------|------------------------------------------------|------------------------------------------------|
32+
| Whisper ASR | [HuggingFace](https://huggingface.co/speechbrain/whisper_rescuespeech) | [Dropbox](https://www.dropbox.com/sh/45wk44h8e0wkc5f/AABjEJJJ_OJp2fDYz3zEihmPa?dl=0) |
33+
| Sepformer Enhancement | [HuggingFace](https://huggingface.co/speechbrain/sepformer_rescuespeech) | [Dropbox](https://www.dropbox.com/sh/02c3wesc65402f6/AAApoxBApft-JwqHK-bddedBa?dl=0) |
3534

3635

3736
# **About SpeechBrain**

recipes/RescueSpeech/dataset.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ This table represents the number of recordings in each of the three sets (train,
9191

9292
## Task: Speech enhancement- Dataset details
9393
---------------
94-
- Noises used:
94+
- Noises used:
9595
- Static and radio noise
9696
- Emergency vehicle and siren noise
9797
- Engine
@@ -111,4 +111,3 @@ Thank You
111111
## Acknowledgment
112112
---------------
113113
This work was supported under the project A-DRZ: Setting up the German Rescue Robotics Center and funded by the German Ministry of Education and Research (BMBF), grant No. I3N14856.
114-
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
librosa
2+
mir_eval
3+
pesq
4+
pystoi

tests/recipes/RescueSpeech.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Task,Dataset,Script_file,Hparam_file,Data_prep_file,Readme_file,Result_url,HF_repo,test_debug_flags,test_debug_checks
2+
ASR+enhancement,RescueSpeech,recipes/RescueSpeech/ASR/noise-robust/train.py,recipes/RescueSpeech/ASR/noise-robust/hparams/robust_asr_16k.yaml,recipes/RescueSpeech/rescuespeech_prepare.py,recipes/RescueSpeech/README.md,https://www.dropbox.com/sh/7tryj6n7cfy0poe/AADpl4b8rGRSnoQ5j6LCj9tua?dl=0,https://huggingface.co/sangeet2020/noisy-whisper-resucespeech,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=1 --skip_prep=True,,

0 commit comments

Comments
 (0)