Describe the bug
The following auxiliary function is used in Filterbank for calculation of mel-scaled filter bank:
def _triangular_filters(self, all_freqs, f_central, band):
"""Returns fbank matrix using triangular filters.
# Computing the slops of the filters
slope = (all_freqs - f_central) / band
left_side = slope + 1.0
right_side = -slope + 1.0
The formula of slopes is apparently wrong. The slopes of the left and right sides of a triangle cannot be the same (up to the sign) as per common sense. Suppose one triangle's frequencies are f0, f1 (central) and f2, then the next one's are f1, f2 (central) and f3. Obviously, in order to satisfy this, the slopes of the first triangle cannot be equal, and neither do the slopes of the second triangle! We would rather expect that the right slope of the first triangle equals (up to the sign) the left slope of the second triangle.
For reference, here is the canonical formula form mel filterbank calculation:
Computing the Mel filterbank
Expected behaviour
The following is a C++ code that shows how SpeechBrain formula differs from the correct one:
void mel_spectrum(const float *spec, int Nspec, int sampling_rate, int Nmels, float *melspec)
{
static vector<float> hz;
if(hz.size() != Nmels+2) {
hz.resize(Nmels+2);
auto hz_to_mel = [](float hz) -> float {
return 2595 * log10(1 + hz / 700);
};
auto mel_to_hz = [](float mel) -> float {
return 700 * (pow((float)10, mel / 2595) - 1);
};
float min_mel = hz_to_mel(0);
float max_mel = hz_to_mel((float)sampling_rate/2);
for(int n=0; n<Nmels+2; ++n) {
hz[n] = mel_to_hz(min_mel + n * (max_mel-min_mel) / (Nmels+1));
}
}
for(int m=0; m<Nmels; ++m) {
float f0 = hz[m+0] / hz[Nmels+1] * (Nspec-1);
float f1 = hz[m+1] / hz[Nmels+1] * (Nspec-1);
float f2 = hz[m+2] / hz[Nmels+1] * (Nspec-1);
melspec[m] = 0;
// apply left slope
for(int k=(int)ceil(f0); k<=(int)f1; ++k) {
melspec[m] += spec[k]*(k-f0)/(f1-f0);
}
// FIXME This is the formula used in SpeechBrain to apply right slope
/*for(int k=(int)ceil(f1); k<=(int)f2; ++k) {
if(2*f1-f0-k>0) melspec[m] += spec[k]*(2*f1-f0-k)/(f1-f0);
}*/
// this is the right formula to apply right slope as per a canonical definition of mel filter bank
for(int k=(int)ceil(f1); k<=(int)f2; ++k) {
melspec[m] += spec[k]*(f2-k)/(f2-f1);
}
}
}
To Reproduce
No response
Versions
No response
Relevant log output
No response
Additional context
No response
Describe the bug
The following auxiliary function is used in Filterbank for calculation of mel-scaled filter bank:
The formula of slopes is apparently wrong. The slopes of the left and right sides of a triangle cannot be the same (up to the sign) as per common sense. Suppose one triangle's frequencies are f0, f1 (central) and f2, then the next one's are f1, f2 (central) and f3. Obviously, in order to satisfy this, the slopes of the first triangle cannot be equal, and neither do the slopes of the second triangle! We would rather expect that the right slope of the first triangle equals (up to the sign) the left slope of the second triangle.
For reference, here is the canonical formula form mel filterbank calculation:
Computing the Mel filterbank
Expected behaviour
The following is a C++ code that shows how SpeechBrain formula differs from the correct one:
To Reproduce
No response
Versions
No response
Relevant log output
No response
Additional context
No response