Audio and Music SSL by poonehmousavi · Pull Request #2755 · speechbrain/speechbrain

poonehmousavi · 2024-11-17T20:49:06Z

What does this PR do?

Adding SSL model for Music and audio domains:

Music: MERT (https://arxiv.org/abs/2306.00107) (https://huggingface.co/papers/2306.00107)
Audio: BEATS (https://arxiv.org/abs/2212.09058) (https://github.com/microsoft/unilm/tree/master/beats)

Fixes #<issue_number>

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

…eechbrain into audio_music_ssl

poonehmousavi · 2024-11-21T23:32:51Z

The error is related to the returning warning, should I skip the doctest fro MERT?

lucadellalib · 2024-11-26T20:00:52Z

+
+    Example
+    -------
+    >>> audio = torch.randn(4, 10000)  # Batch of 4 audio signals


audio = torch.randn(4, 10000) # Batch of 4 audio signals length = torch.tensor([1.0, 0.5, 0.75, 1.0]) model = BEATs("BEATs_iter1_finetuned_on_AS2M_cpt2.pt") outputs = model.extract_features(audio, length) print(outputs.shape) >>> AttributeError: 'tuple' object has no attribute 'shape'

When loading a pretrained model, I guess you do not need the predictor but just the embedding model (see line 2005).

It happens when the finetuned version is loaded, so it also returns the prob_log of the prediction. I changed the code, so it could work with both finetuned and self-supervised checkpoints

lucadellalib · 2024-12-02T19:37:56Z

@@ -0,0 +1,2059 @@
+"""This lobe enables the integration of pretrained BBEATs: Audio Pre-Training with Acoustic Tokenizers.


Fix typo: "BBEATs" -> "BEATs"

lucadellalib · 2024-12-02T19:40:41Z

+        where a projection of the CNN output is added to the beginning.
+        If False, the forward function outputs the hidden states only from the last transformer layer.
+
+    Example


Replace with this, it should fix the tests:

Example ------- >>> import torch >>> inputs = torch.rand([10, 600]) >>> model_hub = "m-a-p/MERT-v1-95M" >>> save_path = "savedir" >>> model = MERT(model_hub, save_path) WARNING: feature_extractor_cqt requires the libray 'nnAudio' >>> outputs = model(inputs) >>> outputs.shape torch.Size([10, 1, 768])

The warning is resolved but then since the warning message contains "library" instead of "library" we got recommit issue

Replace:

>>> model = MERT(model_hub, save_path) WARNING: feature_extractor_cqt requires the libray 'nnAudio'

with

>>> model = MERT(model_hub, save_path) # doctest: +ELLIPSIS WARNING: ...

lucadellalib · 2024-12-02T19:43:38Z

        self, attn_weights, tgt_len: int, src_len: int, bsz: int
    ):
-        """
+        """5


mravanelli · 2024-12-04T22:23:15Z

Thank you @poonehmousavi , for this contribution! I reviewed the PR, and everything seems to work properly. I only have the following comments:

Please add an inline comment for trust_remote_code in huggingface.py to clarify its purpose and usage.
Some docstrings in beats.py are not formatted according to the SpeechBrain standard. For example:
Current Format:

Arguments:  
----------  
	 q (Tensor): Query tensor.  
         v (Tensor): Value tensor.

Expected Format:

Arguments:  
----------  
q: torch.tensor  
    Query tensor.  
v: torch.tensor  
    Value tensor.

There are a few instances like this that need to be corrected.

MERT and BEATs will be moved to the "integration" folder that @pplantinga is currently designing. I think we can first merge the PRs regarding the new tokenizers and right after proceed with the folder restructuring.

…eechbrain into audio_music_ssl

poonehmousavi · 2024-12-04T23:11:50Z

Thank you @poonehmousavi , for this contribution! I reviewed the PR, and everything seems to work properly. I only have the following comments:

Please add an inline comment for trust_remote_code in huggingface.py to clarify its purpose and usage.

Some docstrings in beats.py are not formatted according to the SpeechBrain standard. For example:
Current Format:
Arguments:  
----------  
	 q (Tensor): Query tensor.  
         v (Tensor): Value tensor.  
Expected Format:
Arguments:  
----------  
q: torch.tensor  
    Query tensor.  
v: torch.tensor  
    Value tensor.  
There are a few instances like this that need to be corrected.

MERT and BEATs will be moved to the "integration" folder that @pplantinga is currently designing. I think we can first merge the PRs regarding the new tokenizers and right after proceed with the folder restructuring.

regarding moving BEATs to the integration folder, the current version of BEATs is completely implemented in SB without the need to external library.. so maybe there is no need to be moved to integration folder

mravanelli · 2024-12-05T13:34:46Z

After the latest changes, I'm fine with it.

poonehmousavi and others added 4 commits November 17, 2024 15:39

add MERT

059db63

Merge branch 'speechbrain:develop' into audio_music_ssl

ada5a3d

fix precommit and CI

20b3ff7

Merge branch 'audio_music_ssl' of https://github.com/poonehmousavi/sp…

08c73af

…eechbrain into audio_music_ssl

poonehmousavi self-assigned this Nov 17, 2024

poonehmousavi added the enhancement New feature or request label Nov 17, 2024

poonehmousavi added 2 commits November 20, 2024 11:14

ad beats

3b4a1ed

add docstring

bf29390

TParcollet self-requested a review November 20, 2024 21:44

poonehmousavi added 4 commits November 20, 2024 19:44

fix randomness

1992801

fix beats docstring

70f8222

add docstring for update func

e6d2cc3

fix MERT CI

0d6810c

poonehmousavi requested a review from mravanelli November 21, 2024 23:27

poonehmousavi marked this pull request as ready for review November 21, 2024 23:32

mravanelli requested a review from lucadellalib November 22, 2024 17:25

lucadellalib reviewed Nov 26, 2024

View reviewed changes

apply reviews

4101863

lucadellalib self-requested a review December 2, 2024 19:36

lucadellalib reviewed Dec 2, 2024

View reviewed changes

poonehmousavi and others added 5 commits December 2, 2024 14:59

fix CI and typos

ff5127f

fix CI

7c90d7a

fix typo

5d1b7c9

fix CI

86d8b9c

Merge branch 'develop' into audio_music_ssl

97d96e4

poonehmousavi added 3 commits December 4, 2024 18:05

fix docstring and add comments

3e642e2

Merge branch 'audio_music_ssl' of https://github.com/poonehmousavi/sp…

e4b5475

…eechbrain into audio_music_ssl

fix docstring

db08cf3

mravanelli approved these changes Dec 5, 2024

View reviewed changes

mravanelli merged commit 4bfe32a into speechbrain:develop Dec 5, 2024

poonehmousavi deleted the audio_music_ssl branch February 5, 2025 01:34

		@@ -0,0 +1,2059 @@
		"""This lobe enables the integration of pretrained BBEATs: Audio Pre-Training with Acoustic Tokenizers.

Conversation

poonehmousavi commented Nov 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

poonehmousavi commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucadellalib Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

poonehmousavi Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucadellalib Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

lucadellalib Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

poonehmousavi Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

lucadellalib Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

lucadellalib Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

mravanelli commented Dec 4, 2024

Uh oh!

poonehmousavi commented Dec 4, 2024

Uh oh!

mravanelli commented Dec 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

poonehmousavi commented Nov 17, 2024 •

edited

Loading

poonehmousavi commented Nov 21, 2024 •

edited

Loading