This major release extends SpeechBrain's support for SpeechLLMs and introduces several new features, recipes, and improvements.
Highlights
- Feature Caching — Save extracted features (e.g. wav2vec embeddings) to disk and load them on the fly, skipping recomputation. This powers our first ASR SpeechLLM recipe on LibriSpeech, enabling LLM-based training with pre-computed embeddings.
- New Recipes — SpeechLLM for ASR and translation, streaming SSL, FocalCodec, and SENSE models.
Along with internal improvements and bug fixes. Here follows a changelog of the main changes (omitting some minor bugfixes):
What's Changed
- Reforging LLaMA lobe (code from Samsung AI Center-Cambridge) by @TParcollet in #2850
- Streaming recipe for BestRQ by @Chaanks in #2790
- Librilight data preparation for SpeechBrain SSL (code from Samsung AI Center Cambridge) by @shucongzhang in #2765
- Alignment with CTC ASR models powered by k2. by @ZhaoZeyu1995 in #2772
- SpeechLLM (with LLaMA) and Conformer recipe for speech translation on CoVoST (Code from Samsung AI Center Cambridge) by @TParcollet in #2865
- Feature caching proposal: CachedDynamicItem by @pplantinga in #2985
- Adapting transducer greedy decoding by @younessdkhissi in #2975
- Replace torchaudio I/O with soundfile-based audio_io wrapper by @Copilot in #2989
- FocalCodec [NeurIPS 2025] by @lucadellalib in #3000
- Caching: add compression + filename + closing/loading by @Adel-Moumen in #3005
- Implement per-key padding configuration in PaddedBatch by @Adel-Moumen in #3008
- SpeechLLM LibriSpeech recipe by @Adel-Moumen in #2885
- Remove CTC CUDA + Move transducer loss in integrations by @Adel-Moumen in #3028
- Adding SENSE models by @MaryemBouziane in #2998
New Contributors
- @emmanuel-ferdman made their first contribution in #2900
- @ofiryaish made their first contribution in #2871
- @omidiu made their first contribution in #2923
- @svecjan made their first contribution in #2934
- @nouranali made their first contribution in #2855
- @OscarFree made their first contribution in #2988
- @younessdkhissi made their first contribution in #2975
- @jordanozang made their first contribution in #2982
- @jrochdi made their first contribution in #2947
- @seohyunjun made their first contribution in #2996
- @Daheer made their first contribution in #3022
- @raotnameh made their first contribution in #2617
- @Mr-Neutr0n made their first contribution in #3029
- @georgesabr made their first contribution in #3038
- @MaryemBouziane made their first contribution in #2998
Full Changelog: v1.0.3...v1.1.0
What's Changed
- Fix broken links due to change in voxlingua107 location by @pplantinga in #2877
- Reforging LLaMA lobe (code from Samsung AI Center-Cambridge) by @TParcollet in #2850
- Attempt to fix
self.devicefor AMP by @Adel-Moumen in #2882 - fix convolutions docstrings by @gfdb in #2883
- Attempt fix for input_norm test by @pplantinga in #2887
- Fix dropout at inference by @Adel-Moumen in #2889
- Avoid running tests on draft PRs by @pplantinga in #2890
- Fix input normalization and global normalization variance calculation by @pplantinga in #2835
- Bump version of pyroomacoustics for recipes/DNS/enhancement by @rogiervd in #2899
- Fix argument to rec_loss in (Variational)AutoencoderLoss (from Samsung AI Center Cambridge) by @rogiervd in #2902
- Ensure that x is not unbound in WordEmbeddingEncoder (from Samsung, AI Center, Cambridge) by @rogiervd in #2905
- Fix bug in ModuleList.insert (from Samsung, AI Center, Cambridge) by @rogiervd in #2903
- Fix bug constructing ValueError in Conv2d (from Samsung, AI Center, Cambridge) by @rogiervd in #2904
- Make types correct (from Samsung, AI Center, Cambridge) by @rogiervd in #2912
- Use correct signature for torch.tensor.type() (from Samsung, AI Center, Cambridge) by @rogiervd in #2910
- Update handling of dtype in read_pkl to torch.Tensor (from Samsung, AI Center, Cambridge) by @rogiervd in #2907
- Make arpa_to_fst work when arguments are str (from Samsung, AI Center, Cambridge) by @rogiervd in #2919
- Make f_max integer (from Samsung, AI Center, Cambridge) by @rogiervd in #2918
- Account for all combinations of arguments in IterativeCSVWriter.write (from Samsung, AI Center, Cambridge) by @rogiervd in #2916
- Migrate to modern Python Logger API by @emmanuel-ferdman in #2900
- Do not forget to raise ValueError (from Samsung, AI Center, Cambridge) by @rogiervd in #2909
- Replace axis= by dim= for calls to Torch (from Samsung, AI Center, Cambridge) by @rogiervd in #2913
- Remove unused value (from Samsung, AI Center, Cambridge) by @rogiervd in #2914
- Make Scores.repr actually return something by @rogiervd in #2897
- Remove deprecated abstractproperty (from Samsung, AI Center, Cambridge) by @rogiervd in #2915
- Use "cls" instead of "self" in classmethod by @rogiervd in #2908
- Change location from which to import tqdm by @rogiervd in #2921
- Fix type annotations (from Samsung, AI Center, Cambridge) by @rogiervd in #2911
- Add test for various length mask functions (code from Samsung, AI Center, Cambridge) by @rogiervd in #2894
- Change test for make_padding_mask to expose mistake by @rogiervd in #2895
- fix bug in reverberate with rescale_amp by @ofiryaish in #2871
- docs: Fix contributing.md by @omidiu in #2923
- Renaming largescale ASR to the Loquacious Set by @TParcollet in #2928
- Move files with optional dependencies to integrations folder by @pplantinga in #2782
- Streaming recipe for BestRQ by @Chaanks in #2790
- Librilight data preparation for SpeechBrain SSL (code from Samsung AI Center Cambridge) by @shucongzhang in #2765
- Re-ran tutorial to remove errors by @pplantinga in #2925
- Create
FetchConfigfor standardizing use offetchby @pplantinga in #2828 - Fix issue 'GlobalNorm with DDP' by @svecjan in #2934
- Bump torch from 2.6.0 to 2.7.1 in /docs by @dependabot[bot] in #2939
- Solves #2846 Refactor run_opts into a @DataClass by @nouranali in #2855
- Make run_on_main and main_process_only return the result to all proce… by @rogiervd in #2943
- Saveable Generator: initial import by @flexthink in #2937
- Alignment with CTC ASR models powered by k2. by @ZhaoZeyu1995 in #2772
- Fix forced alignment tutorial link to google colab by @pplantinga in #2951
- Fix missing defaults in run options by @pplantinga in #2952
- Add recipe installation instructions to installation.md by @pplantinga in #2948
- SpeechLLM (with LLaMA) and Conformer recipe for speech translation on CoVoST (Code from Samsung AI Center Cambridge) by @TParcollet in #2865
- Use epoch loop loader for transfer to correctly handle end_of_epoch by @pplantinga in #2965
- fix unnecessary pos_emb for RoPE by @shucongzhang in #2963
- Adopt pyproject and move to Ruff by @TParcollet in #2946
- Update version constraints for torch and torchaudio by @Adel-Moumen in #2978
- Bump torch from 2.7.1 to 2.8.0 in /docs by @dependabot[bot] in #2976
- Fix #2973 bug with period in path by @pplantinga in #2974
- Bandaid fix for torchaudio 2.9+ compatibility by @OscarFree in #2988
- Feature caching proposal: CachedDynamicItem by @pplantinga in #2985
- Adapting transducer greedy decoding by @younessdkhissi in #2975
- Fix for Tracing ECAPA-TDNN Embedding Model by @jordanozang in #2982
- fix speed perturb inversion by @gfdb in #2994
- Add functions for running code once per node by @pplantinga in #2992
- Replace torchaudio I/O with soundfile-based audio_io wrapper by @Copilot in #2989
- SGMSE Voicebank Speech Enhancement Recipe by @jrochdi in #2947
- Update versions of python due to 3.9 EOL by @pplantinga in #3001
- fix issue with --arg value and --arg=value by @Adel-Moumen in #2999
- Rename FetchConfig.use_auth_token to token and pass token to HF hub by @seohyunjun in #2996
- FocalCodec [NeurIPS 2025] by @lucadellalib in #3000
- Caching: add compression + filename + closing/loading by @Adel-Moumen in #3005
- HDF5 integration not loading/caching etc. by @Adel-Moumen in #3007
- Fix local_rank when using single process by @Adel-Moumen in #3006
- Implement per-key padding configuration in PaddedBatch by @Adel-Moumen in #3008
- Add init files so pages get picked up for docs by @pplantinga in #3011
- ksponspeech remove by @Adel-Moumen in #3013
- Replace deprecated torch.cuda.amp.custom_fwd with torch.amp.custom_fw by @omidiu in #2922
- Fix duplicate mic position in beamforming tutorial by @Adel-Moumen in #3019
- Fix text_file assignment - SentencePiece by @pplantinga in #3016
- Update CI requirements by @Adel-Moumen in #3023
- add tests + doc + fix recipes + new get_available_cpu_count fn by @Adel-Moumen in #3025
- Fix typo in literature reference by @Daheer in #3022
- Update scorer.py by @raotnameh in #2617
- SpeechLLM LibriSpeech recipe by @Adel-Moumen in #2885
- Remove CTC CUDA + Move transducer loss in integrations by @Adel-Moumen in #3028
- Add missing mean_stat_per_model method to StatObject_SB by @Mr-Neutr0n in #3029
- support torch>=2.9.0 by @eschmidbauer in #3032
- Fix mutable default arguments in function signatures by @Mr-Neutr0n in #3034
- Fix author name typo: Abous-Rjeili → Abou-Rjeili by @georgesabr in #3038
- Adding SENSE models by @MaryemBouziane in #2998
- Recipe tests new release by @Adel-Moumen in #3042
- v1.0.4 Dev -> Main by @Adel-Moumen in #3044
New Contributors
- @emmanuel-ferdman made their first contribution in #2900
- @ofiryaish made their first contribution in #2871
- @omidiu made their first contribution in #2923
- @svecjan made their first contribution in #2934
- @nouranali made their first contribution in #2855
- @OscarFree made their first contribution in #2988
- @younessdkhissi made their first contribution in #2975
- @jordanozang made their first contribution in #2982
- @jrochdi made their first contribution in #2947
- @seohyunjun made their first contribution in #2996
- @Daheer made their first contribution in #3022
- @raotnameh made their first contribution in #2617
- @Mr-Neutr0n made their first contribution in #3029
- @georgesabr made their first contribution in #3038
- @MaryemBouziane made their first contribution in #2998
Full Changelog: v1.0.3...v1.1.0