Add new Audio Tokenziers#2751
Conversation
|
@mravanelli and @TParcollet, this PR is ready for review. It would be super helpful to review it ASAP since many tasks depend on these new features. |
|
Thanks @poonehmousavi for the work! I am on it, but this will take some time because it's a lot of new code. |
TParcollet
left a comment
There was a problem hiding this comment.
Thanks for the work again. Quite a few minor things. As mentioned already, I don't think that these new lobes should be put in a discrete folder. They should either go to the HF folder if using transformer or to the general lobe folder. This raise a good question around how to start regrouping models under a better directory tree, but we must all talk about that -- and I am pretty sure that the output type shouldn't be a criterion.
| @@ -0,0 +1,157 @@ | |||
| """This lobe enables the integration of pretrained WavTokenizer. | |||
|
|
|||
| Note that you need topip install git+https://github.com/Tomiinek/WavTokenizer` to use this module. | |||
There was a problem hiding this comment.
watch out for typos. Shouldn't our pre-commit catch this @pplantinga ?
|
|
||
|
|
||
| class WavTokenizer(nn.Module): | ||
| """An wrapper for the WavTokenizer model |
There was a problem hiding this comment.
This is a grammar error, not spelling
|
|
||
|
|
||
| class WavTokenizer(nn.Module): | ||
| """An wrapper for the WavTokenizer model |
There was a problem hiding this comment.
This is not describing enough the lobe / model / reference / wrapper from what source.
There was a problem hiding this comment.
added more info
| save_path : str, optional | ||
| Directory where the model and configuration files are saved (default is None). | ||
| config : str, optional | ||
| Configuration filename for the model (default is 'config.yaml'). |
There was a problem hiding this comment.
Config of what? HF? SB?
There was a problem hiding this comment.
HF.You need to specify which file to use because of the way the model is loaded in the original code.
There was a problem hiding this comment.
Is this standard to HF or is it a requirement of our implementation? Anyway, this needs to be explicated in the doctstring, maybe giving an example of a path in a HF repo?
There was a problem hiding this comment.
In general, their code is not very well organized and they are different version of that. The version that I used is the code that sent to me by the author. They upload everything on HF In one zip file without providing any interface and in their code, they expect that you download and extract everything manually.
There was a problem hiding this comment.
I added a little bit more info in the docstring.
| self.dim_codebook = dim_codebook | ||
| self.n_codebook = n_codebook | ||
| self.bw = bw | ||
| self.freq = self.n_codebook * 50 |
There was a problem hiding this comment.
This 50 seems to be arbitrary? Shouldn't it be a parameter? Nothing must be coded in hard.
There was a problem hiding this comment.
there is no usecase for this attribute, so I remove it
| """ | ||
| exp_model_config = OmegaConf.load(config) | ||
| scalar_codec = ScalarModel(**exp_model_config.generator.config) | ||
| parameter_dict = torch.load(self.ckpt_path) |
There was a problem hiding this comment.
This operation looks like something that our Pretrainer class should be able to handle, it would be wiser to use it for clarity and our users habits.
| out = self.scalar_codec.decode(emb_quant.float().to(self.device)) | ||
| return out.detach().cpu().squeeze(0) | ||
|
|
||
| def infer(self, wav_root): |
There was a problem hiding this comment.
Infer is not very self descriptive, especially since there is also an encode function. Maybe rename to same clearer?
There was a problem hiding this comment.
changed to "reconstruct"
TParcollet
left a comment
There was a problem hiding this comment.
@mravanelli requested that I do another review of this PR. I started, but the problem is that I don't see the fixes to my previous comments (certainly not pushed yet as commented as done?). I therefore stopped doing the review as to limit the number of review rounds. Please @poonehmousavi let me know when it's pushed so I can do a last review.
Thanks!
| from torch.nn.utils import remove_weight_norm, weight_norm | ||
|
|
||
|
|
||
| def download_and_extract(repo_id, filename, save_path): |
There was a problem hiding this comment.
Right. Here is what we should do then: This function should be removed, as mentioned, we avoid utility functions in lobes when possible. So let's use fetch for the download and let's create in the fetch.py file another function than enables the extraction of files. Could you please try to write this function such that it can allow a few more extraction format? Maybe using gzip instead of zipfile? Maybe this function could be linked directly to fetch. Like if fetch detects a certain extensions (tar / tar.gz / others) then it triggers the extraction? @Gastron may have a point of view on this.
| print(f"File downloaded, extracted to '{save_path}', and ZIP file removed.") | ||
|
|
||
|
|
||
| def decimal_to_ternary_matrix(decimals, D): |
There was a problem hiding this comment.
Well I don't see it :p
| print(f"File downloaded, extracted to '{save_path}', and ZIP file removed.") | ||
|
|
||
|
|
||
| def decimal_to_ternary_matrix(decimals, D): |
| return ternary_matrix | ||
|
|
||
|
|
||
| def ternary_matrix_to_decimal(matrix): |
| return int((kernel_size * dilation - dilation) / 2) | ||
|
|
||
|
|
||
| class round_func5(InplaceFunction): |
There was a problem hiding this comment.
Could you push it? I don't see it. Thanks.
| return int((kernel_size * dilation - dilation) / 2) | ||
|
|
||
|
|
||
| class round_func5(InplaceFunction): |
There was a problem hiding this comment.
I guess that a right place for this would then be in nnet.
@TParcollet I think I have already pushed all the changes.. you could find the new sq-coded in this file: https://github.com/poonehmousavi/speechbrain/blob/af69859606c4ccaab06f20d17c866898c6d4e9f0/speechbrain/lobes/models/discrete/sq_codec.py#L528 Alos this class is going to be moved to the integration folder, so we might not need to change the loading function. |
There was a problem hiding this comment.
Hi @poonehmousavi I did another pass, thanks for this very important work! I think most comments are minor, but at least one is quite important and will require a bit of re-thinking. Once the comments are addressed, I think we can merge this PR (or wait for the integrations one and adapt?) But please do ping me, I'll make a final review :-)
| torch.Size([8, 10, 2]) | ||
| >>> wav=model.decode(tokens) | ||
| >>> print(wav.shape) | ||
| >>> wav=model.decode(tokens) # doctest: +SKIP |
There was a problem hiding this comment.
Why this? It seems terrible to have a skip of the test in the doctest while it's supposed to be the main feature?
| @@ -0,0 +1,1394 @@ | |||
| """This lobe enables the integration of speech codec model (SQ-Codec) with scalar quantization,. | |||
There was a problem hiding this comment.
Maybe prepare the docstring header for the move to integration?
There was a problem hiding this comment.
@pplantinga is there any specific format i need to follow here?
| filename, | ||
| save_path=None, | ||
| config="config.yaml", | ||
| checkpoint="ckpt_00190000.pth", |
There was a problem hiding this comment.
Please remove this hardcoded model, let's avoid having any forced model in the arguments. If it's mandatory for some reason, this must be justified and explained in the doctstring.
There was a problem hiding this comment.
i removed the hardcoded one.
| emb, emb_quant, x = self.scalar_codec.inference(wav) | ||
| return x.detach().cpu().squeeze(0) | ||
|
|
||
| @property |
There was a problem hiding this comment.
What is this? Return True? This does not look good.
There was a problem hiding this comment.
it will return the quantized singal (x)
There was a problem hiding this comment.
This function does nothing in that case as it's not even based on a class attribute that may change. I propose to just remove this function. (it's always true, so no point having it).
|
|
||
|
|
||
| class WavTokenizer(nn.Module): | ||
| # """A wrapper for the WavTokenizer model |
|
|
||
|
|
||
| class Mimi(HFTransformersInterface): | ||
| # """An wrapper for the HuggingFace Mimi model |
| whether the model will be frozen (e.g. not trainable if used | ||
| as part of training another model) | ||
| num_codebooks : int (default: 8) | ||
| Number of qunatizer. It could be [2,3,4,5,6,7,8] |
There was a problem hiding this comment.
Dunno what a qunatizer is.
|
@pplantinga I'm adding new tokenizers in this PR, and I believe all of them need to be transferred to the integration folder. I’ve added you to this PR to ensure that we’re aligned on the settings and the steps required for the transfer. This will help us converge the effort efficiently. |
From the perspective of the integrations folder I have no objections to this PR moving forward as-is. Once its merged they can be moved to the integrations folder without too much trouble. |
|
I have addressed most of your points. Regarding the unittests and the suggestion to use the SB module instead of the provided module, as I mentioned in the comments, I’m currently working on another recipe that incorporates several quantization techniques. This requires re-implementing many of these techniques in SB from scratch. Once that work is complete, I’ll focus on modifying these models accordingly. For now, since these models are in the integration folder, there shouldn’t be any issues. |
| print(f"File downloaded, extracted to '{save_path}', and ZIP file removed.") | ||
|
|
||
|
|
||
| def decimal_to_ternary_matrix(decimals, D): |
There was a problem hiding this comment.
@poonehmousavi I still don't see it, even though it is marked as 'done'?
| @@ -0,0 +1,157 @@ | |||
| """This lobe enables the integration of pretrained WavTokenizer. | |||
There was a problem hiding this comment.
wavTokenzier.py --> wavTokenizer . Also @pplantinga maybe we want to have a look at unifying the naming in lobe as part of the integration folder? Like camelcase for all models?
|
|
||
|
|
||
| class WavTokenizer(nn.Module): | ||
| """An wrapper for the WavTokenizer model |
TParcollet
left a comment
There was a problem hiding this comment.
@poonehmousavi thanks for your work! This is a 'condition' approval as a few important comments have been pushed to another PR. If this other PR does not come in, we will not be able to release this PR into the master branch (i.e. will be reverted).
| emb, emb_quant, x = self.scalar_codec.inference(wav) | ||
| return x.detach().cpu().squeeze(0) | ||
|
|
||
| @property |
There was a problem hiding this comment.
This function does nothing in that case as it's not even based on a class attribute that may change. I propose to just remove this function. (it's always true, so no point having it).
|
|
||
| class ScalarModel(nn.Module): | ||
| """ | ||
| A custom neural network model for encoding and decoding audio signals. |
There was a problem hiding this comment.
Sure, but a SpeechBrain user will not look at another class docstring when looking at this one. Can we add the relevant reference to this class docstring as well?
| return x | ||
|
|
||
|
|
||
| class CustomRoundingFunction(Function): |
There was a problem hiding this comment.
I'll trust you on this. It is critical, don't forget the unit test as it definitely is a function that could go yolo in the future. Please ping me on the upcoming PR so I can double check as well.
| return output | ||
|
|
||
|
|
||
| class DownsampleLayer(nn.Module): |
There was a problem hiding this comment.
Well @pplantinga is just supposed to transpose code to another folder (with the integration folder), not to re-develop some modules -- especially ones that he does not master. I'll trust you again on that but unit tests and this are important issues, if we don't see a PR addressing these issues at some point we will have to revert. It also becomes harder to track for the reviewers because your new PR will certainly add many new features that will require review, and we will forget about this -- so please, when creating the PR, refer to this one and this comment.
| remove_weight_norm(self.layer) | ||
|
|
||
|
|
||
| class UpsampleLayer(nn.Module): |
There was a problem hiding this comment.
Tag for later -- this must be verified in new PR.
| return x | ||
|
|
||
|
|
||
| class Conv1d(nn.Conv1d): |
There was a problem hiding this comment.
That is starting to be a lot of things. If not done in next PR, we will have to revert.
| return super(Conv1d, self).forward(x) | ||
|
|
||
|
|
||
| class ConvTranspose1d(nn.ConvTranspose1d): |
There was a problem hiding this comment.
tag for further review next PR
| print(f"File downloaded, extracted to '{save_path}', and ZIP file removed.") | ||
|
|
||
|
|
||
| def decimal_to_ternary_matrix(decimals, D): |
| return ternary_matrix | ||
|
|
||
|
|
||
| def ternary_matrix_to_decimal(matrix): |
| @@ -0,0 +1,165 @@ | |||
| """This lobe enables the integration of pretrained WavTokenizer. | |||
There was a problem hiding this comment.
filename (wavTokenzier -> wavTokenizer)
|
@TParcollet Since all the remaining points are related to SQ-Codec, I could remove the file and we only merge Wavtokenzier and Mimi for now.. later I will add a new PR for SQ-Codec, |
|
@poonehmousavi sounds good to me! |
|
@TParcollet and @mravanelli I have removed teh sq_codec for now. The other tokenizers I think are ready to merge. |
|
No idea why I get an error for MERT here, i didn't change anything just removed a file... is there any new update to CI that could cause this? |
Probably an error on the huggingface side. I have run the test locally on this branch and it passes. I restarted the test, hopefully it passes this time. And in the future, this will be part of the integrations, so failures here won't cause CI failures. |
|
still the same problem. i remembered it has the same issue but then we fixed it by adding |
|
Aha, this is actually due to new version of transformers. I just upgraded my local version and got the same error |
|
This test will be moved to integrations folder soon, so let's find a simple way to skip it for now |
|
could we simply add skip test for now? |
…ntegrations folder changes are finished
What does this PR do?
Fixes #<issue_number>
Before submitting
PR review
Reviewer checklist