Skip to content

Recipe for SEP-28k dataset#2574

Merged
pplantinga merged 34 commits intospeechbrain:developfrom
IliasMAOUDJ:SEP28k
Jan 16, 2025
Merged

Recipe for SEP-28k dataset#2574
pplantinga merged 34 commits intospeechbrain:developfrom
IliasMAOUDJ:SEP28k

Conversation

@IliasMAOUDJ
Copy link
Copy Markdown
Contributor

@IliasMAOUDJ IliasMAOUDJ commented Jun 18, 2024

What does this PR do?

This PR adds the SEP-28k dataset (https://github.com/apple/ml-stuttering-events-dataset) to the list of recipes with the partitioning suggested in https://rdcu.be/dK8Ei (https://github.com/th-nuernberg/ml-stuttering-events-dataset-extended).
Additionally, we provide a repository to download the deleted podcast "StrongVoices" and "IStutterSoWhat", so that every researcher can have access to the full dataset.
We provide a minimal working example for training.

Additional: This adds a new task to SpeechBrain which is Stuttering Event Detection and/or Pathological Speech Detection. Other recipes could be added (FluencyBank, UCLASS, ...) if SpeechBrain reviewers are interested.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@IliasMAOUDJ
Copy link
Copy Markdown
Contributor Author

unittests and doctests pass succesfully but linters fails on files that I didn't change.

@asumagic
Copy link
Copy Markdown
Collaborator

asumagic commented Jun 25, 2024

Note: CI is currently failing outside of the PR, when #2581 is merged (will notify) do update the fork branch against develop.

@asumagic
Copy link
Copy Markdown
Collaborator

Note: CI is currently failing outside of the PR, when #2581 is merged (will notify) do update the fork branch against develop.

Done and it seems to work, you can now do that.

@asumagic
Copy link
Copy Markdown
Collaborator

asumagic commented Jul 2, 2024

Note: CI is currently failing outside of the PR, when #2581 is merged (will notify) do update the fork branch against develop.

Done and it seems to work, you can now do that.

Sorry, you need a new develop merge again, the CI fix had broken.

@IliasMAOUDJ
Copy link
Copy Markdown
Contributor Author

Thank you for updating.

Copy link
Copy Markdown
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this recipe for a new task, will be great to have in the toolkit.

I was not able to run this recipe yet, when I tried, I got FileNotFoundError: [Errno 2] No such file or directory: 'manifests/train.csv' which suggests that the recipe is not creating the manifest files as our recipes typically do. Please add code to automatically create the manifest, unless I'm somehow running the recipe wrongly.

I will try to run again once the comments here are addressed.

Comment thread recipes/SEP-28k/README.md
Comment thread recipes/SEP-28k/hparams/train.yaml Outdated
Comment thread recipes/SEP-28k/sep28k_prepare.py Outdated
Comment thread recipes/SEP-28k/train.py Outdated
Comment thread recipes/SEP-28k/train.py Outdated
Comment thread recipes/SEP-28k/train.py Outdated
Comment thread recipes/SEP-28k/train.py Outdated
Comment thread recipes/SEP-28k/train.py Outdated
Comment thread recipes/SEP-28k/train.py Outdated
Comment thread recipes/SEP-28k/train.py Outdated
@IliasMAOUDJ
Copy link
Copy Markdown
Contributor Author

Thank you for your review. All comments were addressed. Hopefully the changes can be accepted.

Concerning the task name, I suggest "Stuttering-Detection" which is straightforward and precise. It can be considered as a subcategory of Fluency/Speech Impairments if the toolkit ever welcome other datasets for specific impairments (which can be produced by various pathologies such as Parkinson's Disease, Alzheimer's Disease, etc).

@pplantinga pplantinga added the recipes Changes to recipes only (add/edit) label Jan 14, 2025
Copy link
Copy Markdown
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've made a few fixes, and the recipe seems to work now, recipe tests and all.

Please make sure the recipe is still running according to your expectations and address any lingering questions. Then this is ready for merge.

Run the following command to train the model:
`python train.py hparams/train.yaml`

Note that this is a minimal working example. The model and training parameters should be modified accordingly.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have results to share with a bigger/better model? A brief discussion of results achieved with the recipe is often included in the README.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your fixes, I ran the recipe after your changes and it works as intended. You can merge it.

I have not developed a better model compared to the literature yet, however I can point to papers dealing with this task.
I am using the dataset for research purpose but not to develop "a better model", this may involve future development for the recipe, we can discuss it in private if you'd like more information.

@pplantinga pplantinga merged commit 388995a into speechbrain:develop Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

recipes Changes to recipes only (add/edit)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants