Add People's Speech (30,000 hours) Conformer ASR (Code from Samsung AI Center Cambridge) by TParcollet · Pull Request #2767 · speechbrain/speechbrain

TParcollet · 2024-11-22T17:09:46Z

This PR introduces an ASR training and an optional data preparation for performing ASR with the People's Speech Dataset.

To-do:

Finish training the model
Answer the question of to download or not to download?
Add recipe testing. WE CAN T RECIPE TEST RECIPES BASED ON HF DATASET BECAUSE CSV ARE NOT USED

Bellow question has been partially answered here (see bellow)
This PR raises an important discussion that we must have with core maintainers and people wishing to participate (@pplantinga, @mravanelli, @Adel-Moumen, @asumagic, @poonehmousavi, @ycemsubakan, @Gastron): What should we do with HuggingFace datasets download? HuggingFace datasets are going to be more and more around, and luckily, the brilliant @Gastron built us a DynamicItemDataset and a set of functions that just work out of the box for it... which means that it is ultra simple to integrate in SB (as seen in this recipe). HOWEVER, HuggingFace datasets must be downloaded before being loaded. SpeechBrain politic has always been to let the user download the dataset, and I think this totally make sense, and i'd like to keep it that way. However, for Gigaspeech for instance, we (I) already broke this rule (sigh). My idea (Adel's actually), would be to dissociate the data_prep and download scripts. We should provide the users another .py that can be run before the recipe to download the dataset. OR we can also not provide any script to avoid maintenance and just give a few instructions to our users. The problem with the latter is that downloading an huggingface dataset is slightly more complex that wget-ing a link -- and users could get it wrong i.e. struggle to match it with our further data prep (csv creation).

The proposal in this PR
It is to the user to download the dataset via HuggingFace beforehand, much like we do for pretty much all the other recipes. I voluntarily deactivate the ability of HF to go look on the internet the dataset. This will force users to DL it where they want beforehand. Being consistent is the most important thing, and it will ease maintainance.

…eleased into develop

…into develop

…rain-released into people_speech

TParcollet · 2024-11-25T14:39:49Z

thanks @Adel-Moumen fixed.

Adel-Moumen

lgtm

Adel-Moumen · 2024-11-27T15:16:56Z

+    else:
+        hf_caching_dir = os.environ["XDG_CACHE_HOME"]
+
+    if hf_caching_dir != hf_download_folder:


actually, I don't understand the variable hf_download_folder. What's the meaning of this one?

That is where the arrow files are extracted ... and we don't want our users to be confused by HF hiding super heavy stuff in random places (what an absolutely horrendous software design).

i need to think more about this but ok

Adel-Moumen · 2024-11-27T15:31:54Z

+        kwargs={
+            "hf_download_folder": hparams["hf_download_folder"],
+            "subsets": hparams["subsets"],
+            "save_folder": hparams["save_folder"],


Question: I found many recipes saving csv directly inside of the output_folder. In this case, you made the choice of saving manifests in the save_folder (i.e. output_folder/save). I was wondering, but, which one should we stick to in the future?

Yes, it's a long standing issue, I don't have an answer for that to be honest..

Adel-Moumen · 2024-11-27T15:32:54Z

Just left some new comments. I run the code and I could extract and train for a few steps the transformer model. Lets merge after the new batch of comments is fixed :)

pplantinga · 2024-11-27T21:52:34Z

I agree with Titouan that we need some consistent way for dataset download. Here's one more place where this already exists in the repo:

https://github.com/speechbrain/speechbrain/blob/develop/recipes/Voicebank/voicebank_prepare.py#L394

In this case this is just a function that users can call if they want to download, but its not explicitly called in the recipe (iirc). I would be fine with either something like this or a short additional file that calls a function like this. that people can choose to run themselves.

Adel-Moumen · 2024-11-28T10:35:34Z

Thanks @TParcollet :)

Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics and others added 30 commits February 8, 2024 12:26

shorter augmentations in yaml

47e3097

layout to 80 char

5ab888a

listed label replication

a3bf472

listed label replication

c86d687

listed label replication

761bf93

Refact CTC

09cfde3

Refact transducer

e60396f

Refact seq2seq

d6a5524

call replicate label instead of duplication

9daba50

refactor aishell

6bf2361

refactor aishell

7ec92c5

CommonLanuageÃ

ebae569

fix error + CV CTC

088a0eb

Giga OOF

bfb9bc2

Giga OOF

21353d5

Giga OOF

9971121

Giga OOF

f879302

Giga OOF

95c5ea4

Giga OOF

1b24844

Giga OOF

a5a97aa

Giga OOF

55904dd

Giga OOF

7f366bb

Finishing OOF

963bda4

final touch LULZ

922024a

fix tests

819f8c8

Tests???Ã

8ade568

fix augment in some recipes

9e73c10

merge

b2b8f56

Merge branch 'develop' of https://github.com/TParcollet/speechbrain-r…

f0e9f6d

…eleased into develop

Merge branch 'develop' of https://github.com/speechbrain/speechbrain …

afd37a1

…into develop

TParcollet added 2 commits November 25, 2024 09:36

ready to review

569bc25

disable download

b79b5e1

TParcollet added ready to review Waiting on reviewer to provide feedback and removed work in progress Not ready for merge labels Nov 25, 2024

TParcollet added 3 commits November 25, 2024 09:49

extra req

8a191ee

no idea about this test

e750775

remove recipe test

f2608fd

Adel-Moumen self-requested a review November 25, 2024 10:14

Adel-Moumen assigned TParcollet Nov 25, 2024

TParcollet added 2 commits November 25, 2024 13:56

Merge branch 'develop' into people_speech

b0e1c5d

small fixes

6bbd27a

Adel-Moumen reviewed Nov 25, 2024

View reviewed changes

TParcollet added 2 commits November 25, 2024 14:38

small fixes

ef3d176

Merge branch 'people_speech' of https://github.com/TParcollet/speechb…

d3ab023

…rain-released into people_speech

Do such that Adel is finally happy

9598f63

Adel-Moumen approved these changes Nov 25, 2024

View reviewed changes

done

bfbed5e

Adel-Moumen reviewed Nov 27, 2024

View reviewed changes

Comment thread recipes/PeoplesSpeech/peoples_speech_prepare.py Outdated

Adel-Moumen reviewed Nov 27, 2024

View reviewed changes

Comment thread recipes/PeoplesSpeech/ASR/transformer/hparams/conformer_large.yaml

Adel-Moumen reviewed Nov 27, 2024

View reviewed changes

Comment thread recipes/PeoplesSpeech/peoples_speech_prepare.py Outdated

Adel-Moumen reviewed Nov 27, 2024

View reviewed changes

TParcollet added 2 commits November 27, 2024 21:21

update readme

bb74d46

fix comments

0035e31

fix error catching

d61ca04

Adel-Moumen merged commit dc57e8f into speechbrain:develop Nov 28, 2024

Conversation

TParcollet commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To-do:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TParcollet commented Nov 25, 2024

Uh oh!

Adel-Moumen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Adel-Moumen Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

TParcollet Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

Adel-Moumen Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Adel-Moumen Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

TParcollet Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

Adel-Moumen commented Nov 27, 2024

Uh oh!

pplantinga commented Nov 27, 2024

Uh oh!

Adel-Moumen commented Nov 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TParcollet commented Nov 22, 2024 •

edited

Loading