Skip to content

[Bug]: wrong formula of calculation of mel filterbank? #2030

@idruker-cerence

Description

@idruker-cerence

Describe the bug

The following auxiliary function is used in Filterbank for calculation of mel-scaled filter bank:

def _triangular_filters(self, all_freqs, f_central, band):
        """Returns fbank matrix using triangular filters.

        # Computing the slops of the filters
        slope = (all_freqs - f_central) / band
        left_side = slope + 1.0
        right_side = -slope + 1.0

The formula of slopes is apparently wrong. The slopes of the left and right sides of a triangle cannot be the same (up to the sign) as per common sense. Suppose one triangle's frequencies are f0, f1 (central) and f2, then the next one's are f1, f2 (central) and f3. Obviously, in order to satisfy this, the slopes of the first triangle cannot be equal, and neither do the slopes of the second triangle! We would rather expect that the right slope of the first triangle equals (up to the sign) the left slope of the second triangle.

For reference, here is the canonical formula form mel filterbank calculation:
Computing the Mel filterbank

Expected behaviour

The following is a C++ code that shows how SpeechBrain formula differs from the correct one:

void mel_spectrum(const float *spec, int Nspec, int sampling_rate, int Nmels, float *melspec)
{
    static vector<float> hz;

    if(hz.size() != Nmels+2) {

        hz.resize(Nmels+2);

        auto hz_to_mel = [](float hz) -> float {
            return 2595 * log10(1 + hz / 700);
        };

        auto mel_to_hz = [](float mel) -> float {
            return 700 * (pow((float)10, mel / 2595) - 1);
        };

        float min_mel = hz_to_mel(0);
        float max_mel = hz_to_mel((float)sampling_rate/2);

        for(int n=0; n<Nmels+2; ++n) {
            hz[n] = mel_to_hz(min_mel + n * (max_mel-min_mel) / (Nmels+1));
        }
    }

    for(int m=0; m<Nmels; ++m) {
        float f0 = hz[m+0] / hz[Nmels+1] * (Nspec-1);
        float f1 = hz[m+1] / hz[Nmels+1] * (Nspec-1);
        float f2 = hz[m+2] / hz[Nmels+1] * (Nspec-1);

        melspec[m] = 0;

        // apply left slope 
        for(int k=(int)ceil(f0); k<=(int)f1; ++k) {
            melspec[m] += spec[k]*(k-f0)/(f1-f0);
        }

        // FIXME This is the formula used in SpeechBrain to apply right slope
        /*for(int k=(int)ceil(f1); k<=(int)f2; ++k) {
            if(2*f1-f0-k>0) melspec[m] += spec[k]*(2*f1-f0-k)/(f1-f0);
        }*/

        // this is the right formula to apply right slope as per a canonical definition of mel filter bank
        for(int k=(int)ceil(f1); k<=(int)f2; ++k) {
            melspec[m] += spec[k]*(f2-k)/(f2-f1);
        }
    }
}

To Reproduce

No response

Versions

No response

Relevant log output

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingconfirmedBug officially confirmed or reproducible; discuss resolution or start writing PR

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions