transformer

LibriSpeech ASR with Transformers or Whisper models.

This folder contains the scripts to train a Transformer-based speech recognizer or the scripts to fine-tune the Whisper encoder-decoder model.

You can download LibriSpeech at http://www.openslr.org/12

How to run

python train_with_whisper.py hparams/train_hf_whisper.yaml
python train.py hparams/transformer.yaml

How to run on test sets only

If you want to run it on the test sets only, you can add the flag --test_only to the following command:

python train_with_whisper.py hparams/train_hf_whisper.yaml --test_only
python train.py hparams/transformer.yaml --test_only

If using a HuggingFace pre-trained model, please make sure you have "transformers" installed in your environment (see extra-requirements.txt)

Results

SpeechLLM with SSL features

Two SpeechLLM modes are supported:

SpeechLLM with SSL features
SpeechLLM with E2E features

In the first mode, the speech features are extracted from the audio waveforms using a pre-trained SSL model, and then projected to the LLM embedding space using a linear layer projection, where everything is trained jointly.

In the second mode, the speech features are already being extracted offline (see: extract_ssl_feats.py script). The LLM is then trained on the frozen SSL representations. This mode is more efficient and faster to train, but at the cost of flexibility on the frozen SSL model.

| Release | Model | hyperparams file | Dev Clean WER | Dev Other WER | Test Clean WER | Test Other WER | HuggingFace link | Model link | GPUs | |:-------------:|:-------------:|:-------------:|:---------------------------:| :-----:| :-----:| :-----:| :-----:| :--------:| | 29-01-26 | WavLM Large + LLama 3.2 1B + LoRA | speechllm_e2e.yaml | 2.79 | 5.03 | 2.72 | 5.34 | HuggingFace | - | 1xA100 80GB |

Whisper Finetuning Result:

Following table contains whisper-finetuning results for 1 epoch using Whisper model, freezing encoder and finetuning decoder.

Release	Model	commit hash	hyperparams file	LM	Dev Clean WER	Test Clean WER	Test Other WER	HuggingFace link	Model link	GPUs
2024-03-28	large-v3	e4e2e13	train_hf_whisper.yaml	No	2.00%	1.96%	4.30%	Not Avail.	DropBox	2xV100S 32GB
2024-03-28	medium.en	e4e2e13	train_hf_whisper.yaml	No	2.35%	2.40%	5.59%	Not Avail.	DropBox	2xV100S 32GB
2024-07-20	small.en	9864011	train_whisper_lora.yaml	No	2.81%	2.90%	6.57%	Not Avail.	DropBox	1x1080Ti 12GB

Transformers

Release	hyperparams file	Dev Clean WER (No LM, small beam)	Test Clean WER (Transformer LM)	Test Other WER (Transformer LM)	HuggingFace link	Model link	GPUs
30-09-24	conformer_large.yaml (new RoPE version)	1.85 with LM	1.96	4.50	Not Avail.	Not Avail.	4xA40 46GB
23-05-23	branchformer_large.yaml	2.72 (1.9 with LM)	2.04	4.13	Not Avail.	DropBox	4xA100 80GB
10-02-25	conformer_large.yaml	1.85 with LM	1.97	4.50	N/A	N/A	4xA100 80GB
23-05-23	conformer_large.yaml	2.62 (1.9 with LM)	2.01	4.52	HuggingFace	DropBox	4xA100 80GB
24-03-22	transformer.yaml	3.32	2.27	5.53	HuggingFace	DropBox	4xV100 32GB
24-03-22	conformer_small.yaml	4.05	2.49	6.1 (only 13.3M parameters)	HuggingFace	DropBox	1xV100 32GB
27-03-23	hyperconformer_8M.yaml	4.69	2.55	6.61 (only 7.9M parameters)	Not Avail.	DropBox	1xP40 24GB
27-03-23	hyperconformer_22M.yaml	3.19	2.23	5.54 (only 21.7M parameters)	Not Avail.	DropBox	1xP40 24GB
03-09-23	hyperbranchformer_13M.yaml	NA	2.54	6.58	Not Avail.	Not Avail.	1xP40 24GB
03-09-23	hyperbranchformer_25M.yaml	NA	2.36	5.89	Not Avail.	Not Avail.	1xP40 24GB
05-01-24	bayesspeech.yaml	4.28	2.84	6.27	Not Avail.	DropBox	1xV100 32GB

About SpeechBrain

Website: https://speechbrain.github.io/
Code: https://github.com/speechbrain/speechbrain/
HuggingFace: https://huggingface.co/speechbrain/

Citing SpeechBrain

Please, cite SpeechBrain if you use it for your research or business.

@misc{speechbrainV1,
  title={Open-Source Conversational AI with SpeechBrain 1.0},
  author={Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Gaelle Laperriere and Mickael Rouvier and Renato De Mori and Yannick Esteve},
  year={2024},
  eprint={2407.00463},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2407.00463},
}
@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}

Name		Name	Last commit message	Last commit date
parent directory ..
hparams		hparams
README.md		README.md
extra_requirements.txt		extra_requirements.txt
extract_ssl_feats.py		extract_ssl_feats.py
librispeech_prepare.py		librispeech_prepare.py
train.py		train.py
train_bayesspeech.py		train_bayesspeech.py
train_speechllm.py		train_speechllm.py
train_with_whisper.py		train_with_whisper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

LibriSpeech ASR with Transformers or Whisper models.

How to run

How to run on test sets only

Results

SpeechLLM with SSL features

Whisper Finetuning Result:

Transformers

About SpeechBrain

Citing SpeechBrain

FilesExpand file tree

transformer

Directory actions

More options

Directory actions

More options

Latest commit

History

transformer

Folders and files

parent directory

README.md

LibriSpeech ASR with Transformers or Whisper models.

How to run

How to run on test sets only

Results

SpeechLLM with SSL features

Whisper Finetuning Result:

Transformers

About SpeechBrain

Citing SpeechBrain