Skip to content

Add BSCodec implementation and recipe#6297

Open
whr-a wants to merge 17 commits intoespnet:masterfrom
whr-a:pr_codec
Open

Add BSCodec implementation and recipe#6297
whr-a wants to merge 17 commits intoespnet:masterfrom
whr-a:pr_codec

Conversation

@whr-a
Copy link
Copy Markdown
Contributor

@whr-a whr-a commented Nov 13, 2025

What did you change?

  • Added the core model implementation for BSCodec under espnet2/gan_codec/bscodec.
  • Added the corresponding recipe (including training, inference, and evaluation) in egs2/bscodec/codec1.
  • Included a README with model results, guidelines for usage, and the paper citation.

Why did you make this change?

This PR introduces BSCodec, a new band-split neural codec. As shown in the paper and results, this model performs well across multiple domains (speech, sound, music) and achieves better reconstruction performance at lower bitrates compared to previous general-purpose codecs.

Additional Context

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Codec Recipe labels Nov 13, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new band-split neural codec, BSCodec, along with its corresponding recipe. The model implementation is well-structured and reuses existing components effectively. The new band-splitting and quantization logic appears correct. However, I found a critical issue in the recipe's run script that will prevent it from executing successfully due to an incorrect path to the training configuration file. My review includes a suggestion to fix this path.

Comment thread egs2/bscodec/codec1/run.sh Outdated

model=BSCodec_band_vq_5band

train_config=conf/tuning/pretrain_gan/${model}.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The path to the training configuration file appears to be incorrect. The pretrain_gan directory is not present in conf/tuning/ based on the file structure in this pull request. The configuration files are located directly under conf/tuning/.

Suggested change
train_config=conf/tuning/pretrain_gan/${model}.yaml
train_config=conf/tuning/${model}.yaml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@sw005320 sw005320 requested a review from ftshijt November 13, 2025 13:56
@sw005320 sw005320 added this to the v.202512 milestone Nov 13, 2025
@sw005320
Copy link
Copy Markdown
Contributor

There are many CI issues.
https://github.com/espnet/espnet/actions/runs/19319439898/job/55257666585?pr=6297
Please fix them.

@sw005320
Copy link
Copy Markdown
Contributor

@ftshijt, can you review this PR?

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 0% with 401 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.04%. Comparing base (059c2f3) to head (4481e6c).
⚠️ Report is 29 commits behind head on master.

Files with missing lines Patch % Lines
espnet2/gan_codec/bscodec/bscodec.py 0.00% 226 Missing ⚠️
...spnet2/gan_codec/shared/quantizer/modules/simvq.py 0.00% 77 Missing ⚠️
espnet2/gan_codec/shared/quantizer/band_vq.py 0.00% 60 Missing ⚠️
...net2/gan_codec/shared/quantizer/modules/core_vq.py 0.00% 36 Missing ⚠️
espnet2/gan_codec/bscodec/__init__.py 0.00% 1 Missing ⚠️
espnet2/tasks/gan_codec.py 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (059c2f3) and HEAD (4481e6c). Click for more details.

HEAD has 24 uploads less than BASE
Flag BASE (059c2f3) HEAD (4481e6c)
test_python_espnet2 8 0
test_utils 8 0
test_integration_espnet2 8 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6297       +/-   ##
===========================================
- Coverage   69.62%   17.04%   -52.59%     
===========================================
  Files         775      771        -4     
  Lines       71542    71416      -126     
===========================================
- Hits        49813    12174    -37639     
- Misses      21729    59242    +37513     
Flag Coverage Δ
test_integration_espnet2 ?
test_python_espnet2 ?
test_python_espnet3 17.04% <0.00%> (-0.10%) ⬇️
test_utils ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sw005320
Copy link
Copy Markdown
Contributor

@whr-a, it still has a lot of CI errors

@whr-a
Copy link
Copy Markdown
Contributor Author

whr-a commented Nov 13, 2025

OK, I'll check them.

Copy link
Copy Markdown
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also register the one by egs2/README.md;

@@ -0,0 +1,63 @@
# codec example yaml config
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to clean the config setup a bit

return quantized_out


class BandVectorQuantization(nn.Module):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inheritance from the original class would be easier, considering many shared components and almost same behavior except for the band-related loss

Comment thread egs2/bscodec/codec1/run.sh Outdated
Comment on lines +34 to +35
--stage 6\
--stop_stage 6\
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clean the setup at by here

Comment thread egs2/bscodec/codec1/README.md Outdated

It is based on DAC structure and band-split strategy, and was trained using the ESPnet codec training pipeline.

The model checkpoint is available at https://huggingface.co/anonymous-release/BSCodec/tree/main
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can switch to public ones and also add the citation to the paper

@sw005320
Copy link
Copy Markdown
Contributor

@whr-a, please reflect the review comments

@whr-a
Copy link
Copy Markdown
Contributor Author

whr-a commented Feb 19, 2026

Ok, I'll fix them soon.

@Fhrozen Fhrozen modified the milestones: v.202604, v.202607 Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Codec ESPnet2 README Recipe size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants