Skip to content

Voice analysis functions#2689

Merged
ycemsubakan merged 56 commits intospeechbrain:developfrom
pplantinga:voice-analysis
Mar 6, 2025
Merged

Voice analysis functions#2689
ycemsubakan merged 56 commits intospeechbrain:developfrom
pplantinga:voice-analysis

Conversation

@pplantinga
Copy link
Copy Markdown
Collaborator

@pplantinga pplantinga commented Sep 17, 2024

This PR introduces functions for helping with voice analysis, such as dysarthric speech detection.

The planned functions are as follows:

  • Estimate f0
  • Jitter
  • Shimmer
  • Harmonicity
  • Glottal-to-Noise-Excitation (GNE) Ratio
  • Spectral features

This provides a start, more may get added later. Tutorial included.

@pplantinga pplantinga added the enhancement New feature or request label Sep 17, 2024
@pplantinga pplantinga self-assigned this Sep 17, 2024
@TParcollet
Copy link
Copy Markdown
Collaborator

@pplantinga let me know if you are satisfied with this PR, it looks good to me. Maybe we may want to provide a tutorial or a short example?

@pplantinga
Copy link
Copy Markdown
Collaborator Author

Let's wait on this, I'm still tweaking it and a tutorial would be nice too

@pplantinga
Copy link
Copy Markdown
Collaborator Author

Tutorial is added, ready for review.

@pplantinga pplantinga marked this pull request as ready for review October 29, 2024 20:23
@pplantinga
Copy link
Copy Markdown
Collaborator Author

Alright, I think this is finally ready for review again @TParcollet @ycemsubakan @mravanelli . It now includes spectral features, and matches PRAAT and OpenSMILE. Later perhaps a recipe can be added for some open dataset.

@pplantinga pplantinga added this to the v1.0.3 milestone Feb 14, 2025
Copy link
Copy Markdown

@bcordel bcordel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments for the jupyter notebook:

Title section:
This notebook goes through a simple voice analysis of a few speech samples. If you are new to speech processing, we recommend reading through this introduction before going through the notebook. First we download a public Parkinson's dataset and cut to just the sustained phonation.

Compute autocorrelation and related features (code box 1):
line 8: perhaps a comment/link about how to estimate best_lags
line 24: step_samples is hardcoded as 441

Could put the same comment from vocal_features.py into .ipynb for GNE

Maybe a "Here are some additional speech processing resources" box before the Speechbrain citation with (for ex):
https://tahull.github.io/blog/2020/08/acf-animated
https://github.com/chautruonglong/Fundamental-Frequency
https://www.fon.hum.uva.nl/praat/
https://www.audeering.com/opensmile/

No comments for the .pys, looks good to me !

@pplantinga
Copy link
Copy Markdown
Collaborator Author

Hi @mravanelli , @bcordel has completed his review and I was able to address the comments. I guess the last thing is your review, let me know if there's anything I can do to help.

Copy link
Copy Markdown
Collaborator

@ycemsubakan ycemsubakan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the additional explanations help a lot!

@ycemsubakan ycemsubakan merged commit 2b3e767 into speechbrain:develop Mar 6, 2025
5 checks passed
@pplantinga pplantinga deleted the voice-analysis branch March 6, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants