Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request introduces a new band-split neural codec, BSCodec, along with its corresponding recipe. The model implementation is well-structured and reuses existing components effectively. The new band-splitting and quantization logic appears correct. However, I found a critical issue in the recipe's run script that will prevent it from executing successfully due to an incorrect path to the training configuration file. My review includes a suggestion to fix this path.
|
|
||
| model=BSCodec_band_vq_5band | ||
|
|
||
| train_config=conf/tuning/pretrain_gan/${model}.yaml |
There was a problem hiding this comment.
The path to the training configuration file appears to be incorrect. The pretrain_gan directory is not present in conf/tuning/ based on the file structure in this pull request. The configuration files are located directly under conf/tuning/.
| train_config=conf/tuning/pretrain_gan/${model}.yaml | |
| train_config=conf/tuning/${model}.yaml |
|
There are many CI issues. |
|
@ftshijt, can you review this PR? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6297 +/- ##
===========================================
- Coverage 69.62% 17.04% -52.59%
===========================================
Files 775 771 -4
Lines 71542 71416 -126
===========================================
- Hits 49813 12174 -37639
- Misses 21729 59242 +37513
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@whr-a, it still has a lot of CI errors |
|
OK, I'll check them. |
ftshijt
left a comment
There was a problem hiding this comment.
Please also register the one by egs2/README.md;
| @@ -0,0 +1,63 @@ | |||
| # codec example yaml config | |||
There was a problem hiding this comment.
It could be better to clean the config setup a bit
| return quantized_out | ||
|
|
||
|
|
||
| class BandVectorQuantization(nn.Module): |
There was a problem hiding this comment.
Maybe inheritance from the original class would be easier, considering many shared components and almost same behavior except for the band-related loss
| --stage 6\ | ||
| --stop_stage 6\ |
There was a problem hiding this comment.
Please clean the setup at by here
|
|
||
| It is based on DAC structure and band-split strategy, and was trained using the ESPnet codec training pipeline. | ||
|
|
||
| The model checkpoint is available at https://huggingface.co/anonymous-release/BSCodec/tree/main |
There was a problem hiding this comment.
We can switch to public ones and also add the citation to the paper
|
@whr-a, please reflect the review comments |
|
Ok, I'll fix them soon. |
What did you change?
espnet2/gan_codec/bscodec.egs2/bscodec/codec1.Why did you make this change?
This PR introduces BSCodec, a new band-split neural codec. As shown in the paper and results, this model performs well across multiple domains (speech, sound, music) and achieves better reconstruction performance at lower bitrates compared to previous general-purpose codecs.
Additional Context
arXiv preprint(The citation is in the README)