Started to add a diarization recipe for ESPnet3 based on DiariZen#6364
Started to add a diarization recipe for ESPnet3 based on DiariZen#6364popcornell wants to merge 120 commits intoespnet:masterfrom
Conversation
…t into espnet3/logging_utils
- This is to avoid using egs folder
for more information, see https://pre-commit.ci
…t into espnet3/logging_utils
…t into espnet3/integration_test
…pnet into espnet3/integration_test
…t into espnet3/integration_test
- Previously we asked developer to create a user-defined modle, but I supported as a default. - Userd can set `val_scheduler_criterion` as espnet2 to use this function.
- supported train/valid switching for preprocessor - Add new default resolver to load external config file
for more information, see https://pre-commit.ci
…pnet into espnet3/integration_test
1. Added the Python version as metadata. 2. Added a flag to generate requirements.txt for experiment-level environment logging. 3. Added a log rotation function for cases where the log file already exists (in espnet2, this was previously handled by a Perl script).
…t into espnet3/integration_test
…t into espnet3/logging_utils
…3/integration_test
…t into espnet3/integration_test
for more information, see https://pre-commit.ci
…pnet into espnet3/recipe/ls_asr100_2
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request introduces a significant set of features for ESPnet3, including a new diarization recipe based on DiariZen, extensive documentation, and a new framework for creating and deploying demos with Gradio. The changes also include substantial improvements to the core infrastructure, such as enhanced logging, configuration handling, and a more robust parallel execution mechanism. Overall, the code is well-structured and the new features are a great addition. I've identified a few high-severity issues related to hardcoded paths in configuration files and an unimplemented feature that is set as default, which could cause recipes to fail. Addressing these will improve the robustness and usability of the new recipes.
| debug_configs=train_asr_transformer_debug.yaml | ||
|
|
||
| echo "==== [ESPnet3] ASR Demo pack ====" | ||
| python -m pip install -e '.[asr]' |
There was a problem hiding this comment.
The script uses python directly here, but ${python} in other places (lines 32, 36). This inconsistency can lead to using the system's default Python interpreter instead of the one from the activated virtual environment, potentially causing the CI job to fail. For consistency and to ensure the correct interpreter is used, ${python} should be used here as well.
| python -m pip install -e '.[asr]' | |
| ${python} -m pip install -e '.[asr]' |
| num_nodes: 1 | ||
|
|
||
| # Path scaffold | ||
| recipe_dir: /Users/samco/Projects/ESPnet3/espnet/egs3/ami_diar/diar |
There was a problem hiding this comment.
The recipe_dir is hardcoded to a local user path. This will cause the recipe to fail for any other user or in any other environment (e.g., CI). This path should be made relative or be determined at runtime, similar to how it's handled in other configuration files where it's commented as being set automatically by run.py.
recipe_dir: .|
|
||
| labels = clustering.fit_predict(embeddings) | ||
| return labels | ||
|
|
||
| def _cluster_vbx( | ||
| self, | ||
| embeddings: np.ndarray, | ||
| num_speakers: int, | ||
| ) -> np.ndarray: | ||
| """Variational Bayes clustering (VBx). | ||
|
|
||
| Note: This is a placeholder. For full VBx implementation, | ||
| integrate with VBDiarization library or similar. | ||
|
|
||
| Args: | ||
| embeddings: (num_speakers, embedding_dim) | ||
| num_speakers: Target number of clusters |
There was a problem hiding this comment.
The _cluster_vbx method raises a NotImplementedError, but the default configuration in egs3/ami_diar/diar/conf/inference.yaml and egs3/ami_diar/diar/conf/tuning/train_xeus_conformer_powerset.yaml sets clustering_backend: vbx. This will cause inference to fail with the default settings. The default in the config files should be changed to a supported backend like ahc, or this method should be implemented.
| exp_dir: ${recipe_dir}/exp/${exp_tag} | ||
| stats_dir: ${recipe_dir}/exp/stats | ||
| decode_dir: ${exp_dir}/decode | ||
| dataset_dir: /path/to/LibriSpeech |
There was a problem hiding this comment.
The dataset_dir is hardcoded to /path/to/LibriSpeech. This will cause the recipe to fail on any machine where the dataset is not at this exact location. It's better to use a placeholder or an environment variable so that users can easily configure the path. For example, you could use an OmegaConf interpolation like ${oc.env:LIBRISPEECH,/path/to/LibriSpeech} to use an environment variable with a fallback.
dataset_dir: /path/to/your/LibriSpeech # Or better, use an environment variable| """Inference output helpers for ASR recipes.""" | ||
|
|
||
|
|
||
| def output_fn(*, data, model_output, idx): | ||
| """Build a dict of outputs for SCP writing.""" | ||
| uttid = data.get("uttid", str(idx)) | ||
| hyp = model_output[0][0] | ||
| ref = data.get("text", "") | ||
| return {"uttid": uttid, "hyp": hyp, "ref": ref} | ||
|
|
||
|
|
||
| def output_fn_transducer(*, data, model_output, idx): | ||
| """Build a dict of outputs for transducer models.""" | ||
| uttid = data.get("uttid", str(idx)) | ||
| hyp = model_output[0] | ||
| ref = data.get("text", "") | ||
| return {"uttid": uttid, "hyp": hyp, "ref": ref} |
There was a problem hiding this comment.
|
This pull request is now in conflict :( |
@Masao-Someki I will change the base branch once you merge
This PR adds a diarization recipe for DiariZen-style diarization but built with ESPnet legacy components:
The architecture just follows DiariZen [1].
Basically it is EEND-VC/Pyannote style:
[1] Han, Jiangyu, et al. "Leveraging self-supervised learning for speaker diarization." ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025.