Skip to content

Adding SENSE models #2998

Merged
Adel-Moumen merged 24 commits intospeechbrain:developfrom
MaryemBouziane:SENSE
Mar 1, 2026
Merged

Adding SENSE models #2998
Adel-Moumen merged 24 commits intospeechbrain:developfrom
MaryemBouziane:SENSE

Conversation

@MaryemBouziane
Copy link
Copy Markdown
Contributor

What does this PR do?

This PR implements the training process of the SENSE models, derived from the MIT/LIUM SAMU-XLSR framework and similar to the Meta SONAR encoder models.
The recipe uses the BGE-M3 embedding model as a teacher and the w2vBert2.0-based speech encoder as a student.
We added also in this PR the integration of the HF w2vBert2.0 model.
More details in https://arxiv.org/pdf/2509.12093

@Adel-Moumen Adel-Moumen self-assigned this Nov 19, 2025
@Adel-Moumen Adel-Moumen added this to the v1.1.0 milestone Nov 19, 2025
Copy link
Copy Markdown
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks a lot for this PR! Please see the comments throughout the PR. I would say, please add a recipe test for this recipe, as well as a README. If you have any checkpoints, it would be great to add them as well. I can upload them on HuggingFace as well as reporting any numbers you got (please look at READMEs in other recipes as template).

Ideally, you should have provide an inference pipeline so that we can release a fully functional recipe end-to-end.

PS: please fix the tests as well! You can run them locally.

Thanks again, thats a great job what you did.

Adel

Comment thread recipes/CommonVoice/SENSE/hparams/train.yaml Outdated
Comment thread recipes/CommonVoice/SENSE/hparams/train.yaml Outdated
Comment thread recipes/CommonVoice/SENSE/hparams/train.yaml Outdated
Comment thread recipes/CommonVoice/SENSE/preparation.py Outdated
Comment thread recipes/CommonVoice/SENSE/hparams/train.yaml Outdated
Comment thread speechbrain/integrations/huggingface/w2v_bert.py Outdated
else:
self.sample_rate = getattr(self.feature_extractor, "sampling_rate", 16000)
logger.info(
f"[W2VBert] sample_rate utilisé pour le feature_extractor = {self.sample_rate}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it french? haha

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread speechbrain/integrations/huggingface/w2v_bert.py Outdated
Comment thread speechbrain/integrations/huggingface/w2v_bert.py Outdated
Comment thread speechbrain/integrations/huggingface/w2v_bert.py Outdated
@Adel-Moumen Adel-Moumen removed this from the v1.1.0 milestone Nov 22, 2025
@MaryemBouziane
Copy link
Copy Markdown
Contributor Author

Hi,

Thanks a lot for this PR! Please see the comments throughout the PR. I would say, please add a recipe test for this recipe, as well as a README. If you have any checkpoints, it would be great to add them as well. I can upload them on HuggingFace as well as reporting any numbers you got (please look at READMEs in other recipes as template).

Ideally, you should have provide an inference pipeline so that we can release a fully functional recipe end-to-end.

PS: please fix the tests as well! You can run them locally.

Thanks again, thats a great job what you did.

Adel

Hi @Adel,

Thank you very much for your helpful review and comments.
We’ve updated the PR accordingly. Please let us know if anything else should be adjusted or improved.

Maryem

Comment thread recipes/CommonVoice/SENSE/README.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the SENSE (Semantic-based speech encoding) training framework, which aligns a w2v-BERT 2.0 speech encoder with BGE-M3 text embeddings in a shared semantic space. The implementation follows the approach described in the SENSE paper, similar to MIT/LIUM SAMU-XLSR and Meta SONAR models.

Key Changes:

  • Integration of BGE-M3 text embedding model as teacher
  • Integration of HuggingFace w2v-BERT 2.0 model as student speech encoder
  • Multilingual training recipe supporting 90+ Common Voice languages with balanced sampling

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
speechbrain/integrations/nlp/bgeM3_embeddings.py New wrapper for BGE-M3 sentence embeddings with dense/sparse/ColBERT output options
speechbrain/integrations/huggingface/w2v_bert.py HuggingFace integration for w2v-BERT 2.0 model with configurable freezing and feature extraction
recipes/CommonVoice/common_voice_sense_prepare.py Data preparation script for multilingual SENSE training with language sampling ratio computation
recipes/CommonVoice/common_voice_prepare.py Minor formatting changes to existing French language preprocessing
recipes/CommonVoice/SENSE/train.py Main training script implementing cosine similarity loss between speech and text embeddings
recipes/CommonVoice/SENSE/hparams/train_sense.yaml Hyperparameters for 90-language multilingual SENSE training with dual optimizers
recipes/CommonVoice/SENSE/common_voice_sense_prepare.py Symlink to shared data preparation script
recipes/CommonVoice/SENSE/README.md Documentation explaining SENSE architecture, multilingual sampling strategy, and usage
tests/recipes/CommonVoice.csv Test configuration entry for SENSE recipe

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread recipes/CommonVoice/SENSE/README.md Outdated
Comment thread recipes/CommonVoice/SENSE/train.py Outdated
Comment thread recipes/CommonVoice/SENSE/hparams/train_sense.yaml Outdated
Comment thread speechbrain/integrations/nlp/bgeM3_embeddings.py
Comment thread recipes/CommonVoice/SENSE/README.md Outdated
Comment thread recipes/CommonVoice/SENSE/train.py Outdated
Comment thread recipes/CommonVoice/SENSE/train.py Outdated
Comment thread tests/recipes/CommonVoice.csv
Comment thread recipes/CommonVoice/SENSE/README.md Outdated
Comment thread speechbrain/integrations/huggingface/w2v_bert.py
MaryemBouziane and others added 10 commits January 7, 2026 13:49
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Adel-Moumen
Copy link
Copy Markdown
Collaborator

Hi @MaryemBouziane, I think we are good to go. There's only one potential bug to fix and the pre-commit. Otherwise, I am happy to merge this PR!

@MaryemBouziane
Copy link
Copy Markdown
Contributor Author

Hi @MaryemBouziane, I think we are good to go. There's only one potential bug to fix and the pre-commit. Otherwise, I am happy to merge this PR!

Thanks @Adel-Moumen for your review!
I’ve fixed the potential bug, and all pre-commit hooks are passing on my side (they were already passing before this change too!).

Copy link
Copy Markdown
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks all.

@Adel-Moumen Adel-Moumen merged commit aca5e41 into speechbrain:develop Mar 1, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants