Gammatone learnable filterbanks by naspert · Pull Request #2714 · speechbrain/speechbrain

naspert · 2024-10-10T13:20:54Z

What does this PR do?

This PR implements parametric Gammatone filterbank, as proposed in the paper "Learnable filter-banks
for CNN-based audio applications", in Proc. of NLDL 2022, initially proposed for ICLR 2020
This is similar to the LEAF frontend already included in the library (except LEAF uses complex Gabor instead of Gammatones). The interest of Gammatone filter banks is that they are considered to be good models of the human ear.

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

- proper gammatoneconv filterbank initialization - fix gammatone implementation

- fix transpose in gammatoneconv

…matone filter bank

naspert · 2024-10-10T13:28:02Z

Original (tensorflow only) paper implementation available for ref. at https://github.com/epfl-lts2/learnable-filterbanks/

TParcollet · 2024-10-12T06:56:23Z

Hi and thank you for your contribution! Could you please add results to the readme? It is important to know what numbers you obtain with this code compared to what already exists. Also, please fix the failing tests as a first step. Feel free to install pre-commit locally to facilitate the exercice.

naspert · 2024-10-16T07:16:18Z

@TParcollet Hello! I had run the pre-commit but I merged the develop branch back on github so it broke a couple of things, I will fix that asap. I have not implemented (yet) the recipes to reproduce our paper, but replacing the LEAF frontend (which includes a more sophisticated learnable pooling and compression) by our Gammatone LFB on the google speech commands dataset yields a test accuracy of 96.5% (vs 97.4% for LEAF). Should I include this result too ?

* data prep scripts update * iterate over utterances * without parallel map * parallel map -> so fast omfg * gigaspeech data prep done * speechcolab extra dep if one must download gigaspeech * create ASR CTC folder * base yaml + update data prep to better reflect potential different naming for csvs * update recipe * update recipe to be compliant with gigaspeech csv * add transformers dep * convert opus to wav * recipe --debug mode works. * typo GRABAGE_UTTERANCE_TAGS -> GARBAGE_UTTERANCE_TAGS * tmp DL file * update DL FILE * add DL file in ASR/CTC * update extra_requirements.txt * add support of savedir within Pretrained subclasses * add wbs requirements * webdataset * remove print * tmp files webdataset * verbosity + metada.json * letzo now label_encoder can actually train + the recipe seems to work. * remove wbs * DL info * HF DL support * remove webdataset as it sucks :p * name * ngram commands * whisper baseline * fix HF * pre-commit + sentencepiece char * remove csv * Add quirks.py, move global PyTorch config and GPU workarounds there * Add support for SB_DISABLE_QUIRKS environment variable * Fetch rework: make savedir optional * bunch of updates to make it run * no download script * fix precommit * fix precommit * readmes * readmes * readmes * readmes * doc update * CI god not happy, make CI god happy * why you here little encoder * adding a tranduscer streaming recipe, because why not * add test for transducer * works better when me not stupid * fix yaml * update req * add warning for cache dir * add warning for cache dir * enable multiprocessing * Minor cleanups to fetching * Change default behavior of inference to not create savedir if not specified * allow data prep without ddp * fix tests * smoll readme update * fix review comments * fixed concat_start_index check (speechbrain#2717) * Ensure adapted models save their parameters (speechbrain#2716) Co-authored-by: Parcollet Titouan <parcollet.titouan@gmail.com> * wtf * update doc * more documentation on storage * missing arg * a bit of logs * new schedulers * new schedulers * Fixes speechbrain#2656: Remove EOS from SoundChoice * fix my stupidity * Update non-HF code path for new preprocessing code in GigaSpeech * Fix CSV path for non-HF Gigaspeech * Fix formatting * Kmeans fix (speechbrain#2642) * fix kmeans bug * fix final batch * fix chuncksize * fix * fix * fix precommit * fix doxstrin inconsistency * fix precommit * fix doc string --------- Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com> * add call on start of fit_batch fn * Update core.py Fix old commit * Update core.py * Fix preprocess_text example * Fix guess_source docstring with up-to-date info * Also remove default savedir from Pretrained * Fix function name for log_applied_quirks * wip audiomnist+gt * Revert "fix normalization for LFB" This reverts commit 3fd0330. * audiomnist classification setup * fix config * add missing file * update dataset load/training * remove unnecessary params * remove sort * remove unnecessary code * fix paths * fix loss computation * add missing flatten * print summary * Explain quirks in docs/experiment.md * ok stupid linter check that hates intentional leading spaces in markdown * add citing in README * add code to pad all wavs to the same length * fix pad call * fix error computation * fix error computation * Make `collect_in` optional for `Pretrainer`, disable it by default * Change more defaults to `savedir=None` and `fetch_strategy=SYMLINK` Since the SYMLINK strategy falls back to NO_LINK whenever `savedir is None`, it makes sense to switch more things to default to `savedir=None`. Should the `savedir` explicitly be set by the user, past behavior is preserved (defaulting to symlinks). * move flatten in audionet * Fix GS transducer test prediction decoding? * fix data prep logic and paths * Actually fix GS transducer test prediction decoding * Remove punctuation filtering that is handled elsewhere * HuggingFance * fix skip data prep logic * add original audionet feature extraction * fix pooling for audionet feature extraction * fix audionet shape + remove input norm * try data augmentation * add missing refs * - rework AudioNet to have optional pooling - use official AudioMNIST train/test/valid splits * fix typo in url * update audionet hparams * update audionet custom hparams * update audionet custom hparams * Updated warning for load_collected * Add results and notices for results for GigaSpeech transducer & wavlm * english hard * update audionet custom hparams * fix doc + pre-commit clean * fix code examples * fix consistency tests * fix pre commit * remove config * fix docstring for LFB * fix docstring for GammatoneConv1D --------- Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> Co-authored-by: Adel Moumen <88119391+Adel-Moumen@users.noreply.github.com> Co-authored-by: asu <sdelang@sdelang.fr> Co-authored-by: TParcollet <parcollet.titouan@gmail.com> Co-authored-by: Peter Plantinga <plantinga.peter@proton.me> Co-authored-by: gianfranco <62777451+gfdb@users.noreply.github.com> Co-authored-by: Peter Plantinga <plantinga.peter@protonmail.com> Co-authored-by: Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics <t.parcollet@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: flexthink <flexthink@users.noreply.github.com> Co-authored-by: Pooneh Mousavi <moosavi.pooneh@gmail.com> Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com>

naspert · 2024-10-29T15:23:45Z

@TParcollet Hi. I added the results for the AudioMNIST classification task, which are consistent with the ones we got in our paper. The tests have also been fixed. Unless you have concerns about those changes, I will work to add the Google speech command experiment we did.

naspert · 2024-11-22T17:48:24Z

I have checked the build failure, but it seems related to huggingface stuff, not sure I can fix this...

* update README * specify encoding for open()

naspert · 2024-11-25T11:19:14Z

I doubled check on my fork and all builds are passing (after merging back develop)

merge develop changes

Gammatone dev

naspert · 2025-05-28T09:13:11Z

I wonder if anyone will review this PR... oh well at least I tried.

Adel-Moumen · 2025-05-28T09:20:22Z

I wonder if anyone will review this PR... oh well at least I tried.

Hey @naspert, sorry for the delay! We apologize as we had quite a lot of work to do and weren't able to process this PR on time. @TParcollet, what your current views on this PR?

naspert · 2025-05-28T09:28:38Z

@Adel-Moumen thanks for your answer (and I know maintainers are volunteers that do this most necessary work on their spare time) ! And if anybody has feedback, that would be very welcome :)

asumagic and others added 21 commits September 19, 2024 11:04

Mock some dependencies for docs

cebd994

wip gammatone

6ccbbf7

Implement gammatone impulse response

e294fd8

add gammatone conv and filters

ec85977

wip:

7778362

- proper gammatoneconv filterbank initialization - fix gammatone implementation

add the LFB frontend class

4b83b60

add the LFB recipe

d3eda84

fix typo (extra comma)

c2eacb7

fix device location for gammatone

0bbf887

fix filter tensor dim

fa83730

pass skip_transpose to gammatone

39cd1ca

convert filters to float32

abbd510

- fix activation init in LFB config

7a526ce

- fix transpose in gammatoneconv

fix tensor sizes (match the LEAF output) and use window_stride in gam…

65e9731

…matone filter bank

fix paths for GSC dataset

c39eaf7

rename output folder for LFB

b6838ca

fix typo in path

487b3ac

run pre-commit script

2c15b30

add documentation

4f383cc

update lfb recipe

e579a52

add test

52e7774

Merge branch 'develop' into gammatone_filterbanks

cc7724b

naspert added 2 commits October 16, 2024 09:32

fix pre-commit

a5a248d

fix test

7b59d71

naspert force-pushed the gammatone_filterbanks branch from f015bc1 to 7b59d71 Compare October 18, 2024 07:08

naspert added 2 commits October 18, 2024 09:08

fix normalization for LFB

3fd0330

fix CategoricalEncoder warning for GSC

b2248b9

Merge remote-tracking branch 'origin' into gammatone_filterbanks

a6a4b7a

naspert added 2 commits November 25, 2024 08:12

Merge remote-tracking branch 'origin' into gammatone_filterbanks

69cb6b8

Gammatone dev (#3)

aff43dc

* update README * specify encoding for open()

naspert added 5 commits November 26, 2024 09:35

use the seed_everything function in hparams (#4)

61d06b4

Merge remote-tracking branch 'origin' into gammatone_dev

dfb7bef

Merge pull request #5 from naspert/gammatone_dev

f55eafb

merge develop changes

Merge remote-tracking branch 'origin' into gammatone_dev

1370aae

Merge pull request #6 from naspert/gammatone_dev

9124db9

Gammatone dev

update datasets/results location

abe3c3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gammatone learnable filterbanks#2714

Gammatone learnable filterbanks#2714
naspert wants to merge 36 commits intospeechbrain:developfrom
naspert:gammatone_filterbanks

naspert commented Oct 10, 2024 •

edited

Loading

Uh oh!

naspert commented Oct 10, 2024

Uh oh!

TParcollet commented Oct 12, 2024 •

edited

Loading

Uh oh!

naspert commented Oct 16, 2024

Uh oh!

naspert commented Oct 29, 2024 •

edited

Loading

Uh oh!

naspert commented Nov 22, 2024

Uh oh!

naspert commented Nov 25, 2024

Uh oh!

naspert commented May 28, 2025

Uh oh!

Adel-Moumen commented May 28, 2025 •

edited

Loading

Uh oh!

naspert commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

naspert commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

naspert commented Oct 10, 2024

Uh oh!

TParcollet commented Oct 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naspert commented Oct 16, 2024

Uh oh!

naspert commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naspert commented Nov 22, 2024

Uh oh!

naspert commented Nov 25, 2024

Uh oh!

naspert commented May 28, 2025

Uh oh!

Adel-Moumen commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naspert commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

naspert commented Oct 10, 2024 •

edited

Loading

TParcollet commented Oct 12, 2024 •

edited

Loading

naspert commented Oct 29, 2024 •

edited

Loading

Adel-Moumen commented May 28, 2025 •

edited

Loading