|
25 | 25 | "source": [ |
26 | 26 | "# Analyzing Vocal Features for Pathology\n", |
27 | 27 | "\n", |
28 | | - "This notebook goes through a simple voice analysis of a few speech samples. First we download a public Parkinson's dataset and cut to just the sustained phonation." |
| 28 | + "This notebook goes through a simple voice analysis of a few speech samples. If you are new to speech feature extraction, we recommend reading through [Aalto Speech Processing Ch. 3 Basic Representations](https://speechprocessingbook.aalto.fi/Representations/Representations.html) before going through the notebook to understand the background and theory behind the signal processing techniques used here.\n", |
| 29 | + "\n", |
| 30 | + "As a sample vocalization for demonstration purposes, we first download a public sample from a person with Parkinson's disease and cut to just the sustained phonation." |
29 | 31 | ] |
30 | 32 | }, |
31 | 33 | { |
|
155 | 157 | "source": [ |
156 | 158 | "## Compute autocorrelation and related features\n", |
157 | 159 | "\n", |
158 | | - "Autocorrelation is the cross-correlation of a signal with itself at various lags.\n", |
159 | | - "For harmonic signals, there are peaks at regular lag intervals corresponding to the period.\n", |
| 160 | + "Autocorrelation is the cross-correlation of a signal with itself at each lag from min_lag to max_lag.\n", |
| 161 | + "For periodic/harmonic signals, there are peaks at regular lag intervals corresponding to the period.\n", |
160 | 162 | "The autocorrelation ratio is the ratio of the strongest peak against the theoretical maximum\n", |
161 | | - "which occurs when the lag is zero." |
| 163 | + "which occurs when the lag is zero.\n", |
| 164 | + "\n", |
| 165 | + "For animations which may be helpful for understanding the concept, see the following:\n", |
| 166 | + "* [https://tahull.github.io/blog/2020/08/acf-animated](https://tahull.github.io/blog/2020/08/acf-animated)\n", |
| 167 | + "* [https://github.com/chautruonglong/Fundamental-Frequency](https://github.com/chautruonglong/Fundamental-Frequency)" |
162 | 168 | ] |
163 | 169 | }, |
164 | 170 | { |
|
193 | 199 | "frames = audio.unfold(-1, window_samples, step_samples)\n", |
194 | 200 | "autocorrelation = autocorrelate(frames)\n", |
195 | 201 | "\n", |
196 | | - "# Use autocorrelation to estimate harmonicity and best lags\n", |
| 202 | + "# Use autocorrelation maxima to estimate harmonicity and lags corresponding to period\n", |
197 | 203 | "harmonicity, lags = autocorrelation[:, :, min_lag:max_lag].max(dim=-1)\n", |
198 | | - "lags = torch.nn.functional.pad(lags, pad=(3, 3)) \n", |
| 204 | + "lags = torch.nn.functional.pad(lags, pad=(3, 3))\n", |
| 205 | + "\n", |
| 206 | + "# Take the median of 7 frames to avoid short octave jumps\n", |
199 | 207 | "best_lags, _ = lags.unfold(-1, 7, 1).median(dim=-1)\n", |
200 | 208 | "\n", |
201 | 209 | "# Re-add the min_lag back in after previous step removed it\n", |
|
212 | 220 | "xticks = (torch.arange(1, 7) / 2 / step_size).int().tolist()\n", |
213 | 221 | "plt.xticks(xticks, xs[xticks].tolist())\n", |
214 | 222 | "yticks = torch.linspace(0, max_lag - min_lag, 5).int()\n", |
215 | | - "plt.yticks(yticks.tolist(), ((yticks + min_lag) / 441).numpy().round(decimals=2))\n", |
| 223 | + "plt.yticks(yticks.tolist(), ((yticks + min_lag) / step_samples).numpy().round(decimals=2))\n", |
216 | 224 | "plt.show()\n", |
217 | 225 | "\n", |
218 | 226 | "# Show autocorrelation-based features, harmonicity (usually represented in log scale as HNR) and f0\n", |
|
505 | 513 | "source": [ |
506 | 514 | "## Compute GNE step-by-step\n", |
507 | 515 | "\n", |
508 | | - "The algorithm is best described in \"The Effectiveness of the Glottal to Noise Excitation Ratio for the Screening of Voice Disorders\" by Godino-Llorente et al.\n" |
| 516 | + "An algorithm for GNE computation from the original paper:\n", |
| 517 | + "\n", |
| 518 | + "\"Glottal-to-Noise Excitation Ratio - a New Measure for Describing\n", |
| 519 | + "Pathological Voices\" by D. Michaelis, T. Oramss, and H. W. Strube.\n", |
| 520 | + "\n", |
| 521 | + "This algorithm divides the signal into frequency bands, and compares\n", |
| 522 | + "the correlation between the bands. High correlation indicates a\n", |
| 523 | + "relatively low amount of noise in the signal, whereas lower correlation\n", |
| 524 | + "could be a sign of pathology in the vocal signal.\n", |
| 525 | + "\n", |
| 526 | + "Godino-Llorente et al. in \"The Effectiveness of the Glottal to Noise\n", |
| 527 | + "Excitation Ratio for the Screening of Voice Disorders\" explore the\n", |
| 528 | + "goodness of the bandwidth and frequency shift parameters, and write out\n", |
| 529 | + "a clear description of how to compute the measure, used here." |
509 | 530 | ] |
510 | 531 | }, |
511 | 532 | { |
|
700 | 721 | "source": [ |
701 | 722 | "## PRAAT-Parselmouth\n", |
702 | 723 | "\n", |
703 | | - "We'll run a similar analysis to verify that our numbers look accurate." |
| 724 | + "The following is a side-by-side analysis as PRAAT, a commonly used voice analysis tool, to verify that our numbers look accurate. To read more about PRAAT and Parselmouth, see here:\n", |
| 725 | + "\n", |
| 726 | + "* [https://www.fon.hum.uva.nl/praat/](https://www.fon.hum.uva.nl/praat/)\n", |
| 727 | + "* [https://parselmouth.readthedocs.io/en/stable/](https://parselmouth.readthedocs.io/en/stable/)" |
704 | 728 | ] |
705 | 729 | }, |
706 | 730 | { |
|
868 | 892 | "source": [ |
869 | 893 | "## Comparison with OpenSMILE\n", |
870 | 894 | "\n", |
871 | | - "Unlike PRAAT, we can do a frame-by-frame comparison with OpenSMILE." |
| 895 | + "Unlike PRAAT, we can do a frame-by-frame comparison with OpenSMILE, which is helpful for further verification of our approach.\n", |
| 896 | + "\n", |
| 897 | + "* [https://www.audeering.com/opensmile/](https://www.audeering.com/opensmile/)" |
872 | 898 | ] |
873 | 899 | }, |
874 | 900 | { |
|
1276 | 1302 | "name": "python", |
1277 | 1303 | "nbconvert_exporter": "python", |
1278 | 1304 | "pygments_lexer": "ipython3", |
1279 | | - "version": "3.12.7" |
| 1305 | + "version": "3.13.1" |
1280 | 1306 | } |
1281 | 1307 | }, |
1282 | 1308 | "nbformat": 4, |
|
0 commit comments