Recipe for SEP-28k dataset#2574
Conversation
This reverts commit 5de0026.
|
unittests and doctests pass succesfully but linters fails on files that I didn't change. |
|
Note: CI is currently failing outside of the PR, when #2581 is merged (will notify) do update the fork branch against |
Done and it seems to work, you can now do that. |
Sorry, you need a new |
This reverts commit 94cec8d.
|
Thank you for updating. |
pplantinga
left a comment
There was a problem hiding this comment.
Thanks for adding this recipe for a new task, will be great to have in the toolkit.
I was not able to run this recipe yet, when I tried, I got FileNotFoundError: [Errno 2] No such file or directory: 'manifests/train.csv' which suggests that the recipe is not creating the manifest files as our recipes typically do. Please add code to automatically create the manifest, unless I'm somehow running the recipe wrongly.
I will try to run again once the comments here are addressed.
Most changes here follow the comments made by @pplantinga . TODO: Look into BinaryMetrics for score computation.
|
Thank you for your review. All comments were addressed. Hopefully the changes can be accepted. Concerning the task name, I suggest "Stuttering-Detection" which is straightforward and precise. It can be considered as a subcategory of Fluency/Speech Impairments if the toolkit ever welcome other datasets for specific impairments (which can be produced by various pathologies such as Parkinson's Disease, Alzheimer's Disease, etc). |
pplantinga
left a comment
There was a problem hiding this comment.
Okay, I've made a few fixes, and the recipe seems to work now, recipe tests and all.
Please make sure the recipe is still running according to your expectations and address any lingering questions. Then this is ready for merge.
| Run the following command to train the model: | ||
| `python train.py hparams/train.yaml` | ||
|
|
||
| Note that this is a minimal working example. The model and training parameters should be modified accordingly. |
There was a problem hiding this comment.
Do you have results to share with a bigger/better model? A brief discussion of results achieved with the recipe is often included in the README.
There was a problem hiding this comment.
Thanks for your fixes, I ran the recipe after your changes and it works as intended. You can merge it.
I have not developed a better model compared to the literature yet, however I can point to papers dealing with this task.
I am using the dataset for research purpose but not to develop "a better model", this may involve future development for the recipe, we can discuss it in private if you'd like more information.
What does this PR do?
This PR adds the SEP-28k dataset (https://github.com/apple/ml-stuttering-events-dataset) to the list of recipes with the partitioning suggested in https://rdcu.be/dK8Ei (https://github.com/th-nuernberg/ml-stuttering-events-dataset-extended).
Additionally, we provide a repository to download the deleted podcast "StrongVoices" and "IStutterSoWhat", so that every researcher can have access to the full dataset.
We provide a minimal working example for training.
Additional: This adds a new task to SpeechBrain which is Stuttering Event Detection and/or Pathological Speech Detection. Other recipes could be added (FluencyBank, UCLASS, ...) if SpeechBrain reviewers are interested.
Before submitting
PR review
Reviewer checklist