Skip to content

Denoising Diffusion Probabilistic Models for SpeechBrain#1599

Merged
mravanelli merged 179 commits intospeechbrain:developfrom
flexthink:diffusion-direct
Aug 13, 2023
Merged

Denoising Diffusion Probabilistic Models for SpeechBrain#1599
mravanelli merged 179 commits intospeechbrain:developfrom
flexthink:diffusion-direct

Conversation

@flexthink
Copy link
Copy Markdown
Collaborator

@flexthink flexthink commented Oct 10, 2022

This PR contains a basic implementation of Denoising Diffusion Probabilistic Models (DDPM) for SpeechBrain
https://arxiv.org/pdf/2006.11239.pdf

An example is provided to generate MEL-spectrograms using the AudioMNIST dataset
It also contains an implementation of DiffWave by @BenoitWang

@flexthink flexthink changed the title Denoting Diffusion Probabilistic Models for SpeechBrain Denoising Diffusion Probabilistic Models for SpeechBrain Oct 10, 2022
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-ran the fastspeech training to check the shapes and I found squeeze(-1) is correct. Everything works fine, thanks @flexthink, a good catch!

@mravanelli
Copy link
Copy Markdown
Collaborator

I did extensive tests. I ran the full diffusion recipes. Everything looks good to me now. The quality of the digit generated with diffusion is pretty high, considering that AudioMNIST is a small dataset. Only the quality of latent diffusion is quite low, but I think we can improve it in another PR. Thank you @flexthink and @BenoitWang for this great work!

@mravanelli mravanelli merged commit 19afe0f into speechbrain:develop Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants