Started to add a diarization recipe for ESPnet3 based on DiariZen by popcornell · Pull Request #6364 · espnet/espnet

popcornell · 2026-02-12T12:56:30Z

@Masao-Someki I will change the base branch once you merge

This PR adds a diarization recipe for DiariZen-style diarization but built with ESPnet legacy components:

WavLM or XEUS as front-end
Any ESPnet2 speaker embedding mode

The architecture just follows DiariZen [1].

Basically it is EEND-VC/Pyannote style:

we use a fixed chunk e.g. 20 seconds and powerset EEND within the chunk
we use the activities to extract speaker embeddings for each speaker for each chunk and then use clustering to reassign global speaker IDs.

[1] Han, Jiangyu, et al. "Leveraging self-supervised learning for speaker diarization." ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025.

…t into espnet3/logging_utils

- This is to avoid using egs folder

for more information, see https://pre-commit.ci

…t into espnet3/logging_utils

…t into espnet3/integration_test

…pnet into espnet3/integration_test

…t into espnet3/integration_test

- Assume hypothesis to be "" when hypothesis is blank

- Previously we asked developer to create a user-defined modle, but I supported as a default. - Userd can set `val_scheduler_criterion` as espnet2 to use this function.

- supported train/valid switching for preprocessor - Add new default resolver to load external config file

for more information, see https://pre-commit.ci

…pnet into espnet3/integration_test

1. Added the Python version as metadata. 2. Added a flag to generate requirements.txt for experiment-level environment logging. 3. Added a log rotation function for cases where the log file already exists (in espnet2, this was previously handled by a Perl script).

…t into espnet3/integration_test

…t into espnet3/logging_utils

…3/integration_test

…t into espnet3/integration_test

for more information, see https://pre-commit.ci

…pnet into espnet3/recipe/ls_asr100_2

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request introduces a significant set of features for ESPnet3, including a new diarization recipe based on DiariZen, extensive documentation, and a new framework for creating and deploying demos with Gradio. The changes also include substantial improvements to the core infrastructure, such as enhanced logging, configuration handling, and a more robust parallel execution mechanism. Overall, the code is well-structured and the new features are a great addition. I've identified a few high-severity issues related to hardcoded paths in configuration files and an unimplemented feature that is set as default, which could cause recipes to fail. Addressing these will improve the robustness and usability of the new recipes.

gemini-code-assist · 2026-02-12T13:01:39Z

+debug_configs=train_asr_transformer_debug.yaml
+
+echo "==== [ESPnet3] ASR Demo pack ===="
+python -m pip install -e '.[asr]'


The script uses python directly here, but ${python} in other places (lines 32, 36). This inconsistency can lead to using the system's default Python interpreter instead of the one from the activated virtual environment, potentially causing the CI job to fail. For consistency and to ensure the correct interpreter is used, ${python} should be used here as well.

Suggested change

python -m pip install -e '.[asr]'

${python} -m pip install -e '.[asr]'

gemini-code-assist · 2026-02-12T13:01:40Z

+num_nodes: 1
+
+# Path scaffold
+recipe_dir: /Users/samco/Projects/ESPnet3/espnet/egs3/ami_diar/diar


The recipe_dir is hardcoded to a local user path. This will cause the recipe to fail for any other user or in any other environment (e.g., CI). This path should be made relative or be determined at runtime, similar to how it's handled in other configuration files where it's commented as being set automatically by run.py.

recipe_dir: .

gemini-code-assist · 2026-02-12T13:01:40Z

+
+        labels = clustering.fit_predict(embeddings)
+        return labels
+
+    def _cluster_vbx(
+        self,
+        embeddings: np.ndarray,
+        num_speakers: int,
+    ) -> np.ndarray:
+        """Variational Bayes clustering (VBx).
+
+        Note: This is a placeholder. For full VBx implementation,
+        integrate with VBDiarization library or similar.
+
+        Args:
+            embeddings: (num_speakers, embedding_dim)
+            num_speakers: Target number of clusters


The _cluster_vbx method raises a NotImplementedError, but the default configuration in egs3/ami_diar/diar/conf/inference.yaml and egs3/ami_diar/diar/conf/tuning/train_xeus_conformer_powerset.yaml sets clustering_backend: vbx. This will cause inference to fail with the default settings. The default in the config files should be changed to a supported backend like ahc, or this method should be implemented.

gemini-code-assist · 2026-02-12T13:01:40Z

+exp_dir: ${recipe_dir}/exp/${exp_tag}
+stats_dir: ${recipe_dir}/exp/stats
+decode_dir: ${exp_dir}/decode
+dataset_dir: /path/to/LibriSpeech


The dataset_dir is hardcoded to /path/to/LibriSpeech. This will cause the recipe to fail on any machine where the dataset is not at this exact location. It's better to use a placeholder or an environment variable so that users can easily configure the path. For example, you could use an OmegaConf interpolation like ${oc.env:LIBRISPEECH,/path/to/LibriSpeech} to use an environment variable with a fallback.

dataset_dir: /path/to/your/LibriSpeech # Or better, use an environment variable

gemini-code-assist · 2026-02-12T13:01:40Z

+"""Inference output helpers for ASR recipes."""
+
+
+def output_fn(*, data, model_output, idx):
+    """Build a dict of outputs for SCP writing."""
+    uttid = data.get("uttid", str(idx))
+    hyp = model_output[0][0]
+    ref = data.get("text", "")
+    return {"uttid": uttid, "hyp": hyp, "ref": ref}
+
+
+def output_fn_transducer(*, data, model_output, idx):
+    """Build a dict of outputs for transducer models."""
+    uttid = data.get("uttid", str(idx))
+    hyp = model_output[0]
+    ref = data.get("text", "")
+    return {"uttid": uttid, "hyp": hyp, "ref": ref}


This file appears to be a duplicate of egs3/mini_an4/asr/src/infer.py. Having duplicate code increases maintenance overhead and can lead to inconsistencies. It would be better to consolidate them into a single file and have all dependent configurations point to it.

mergify · 2026-02-17T20:58:16Z

This pull request is now in conflict :(

Masao-Someki and others added 30 commits December 26, 2025 11:51

Merge branch 'espnet3/package_files' of github.com:Masao-Someki/espne…

c051aea

…t into espnet3/logging_utils

reverted unrequired change

e974e77

Update docstrings

dc7c418

Fixed CI issue

e8c044f

Add integration test

67f9fc8

Added jiwer for scoring stage

374c10d

Moved mini-an4 dataset from egs to egs2

b205eb6

- This is to avoid using egs folder

[pre-commit.ci] auto fixes from pre-commit.com hooks

ee656bb

for more information, see https://pre-commit.ci

Merge branch 'espnet3/package_files' of github.com:Masao-Someki/espne…

fea08cf

…t into espnet3/logging_utils

Merge branch 'espnet3/package_files' of github.com:Masao-Someki/espne…

945d072

…t into espnet3/integration_test

Merge branch 'espnet3/integration_test' of github.com:Masao-Someki/es…

734b6fe

…pnet into espnet3/integration_test

Fixed CI issue

086b26e

Merge branch 'espnet3/logging_utils' of github.com:Masao-Someki/espne…

9079975

…t into espnet3/integration_test

Bug fix

d42994a

- Assume hypothesis to be "" when hypothesis is blank

Add validation-based lr scheduler such as ReduceOnPlateau

e649581

- Previously we asked developer to create a user-defined modle, but I supported as a default. - Userd can set `val_scheduler_criterion` as espnet2 to use this function.

Skip bulding tokenizer when exists

7fae474

Some bug fix for config loading

eb8762d

- supported train/valid switching for preprocessor - Add new default resolver to load external config file

Add transducer system, task, and inference runner

4d12f16

Add configs for integration test

e8d0d97

Add running script for transducer with Transducer system

a4eb530

Add more integration tests for ASR

173561e

Removed debug line

11def0b

[pre-commit.ci] auto fixes from pre-commit.com hooks

094af9f

for more information, see https://pre-commit.ci

Format and fixed CI

dcbd1de

Merge branch 'espnet3/integration_test' of github.com:Masao-Someki/es…

b58bc4e

…pnet into espnet3/integration_test

Applied formatting

ab59843

Merge branch 'espnet3/logging_utils' of github.com:Masao-Someki/espne…

4105061

…t into espnet3/integration_test

Separate logs for each stages

e88e051

Added pack-model, upload-model stage

933d63b

Masao-Someki and others added 19 commits January 30, 2026 04:38

Fix based on the modification to inference stage

4307834

Add setup_uv.sh

2cda050

Fixed merge mistake

0336d7a

Merge branch 'espnet3/package_files' of github.com:Masao-Someki/espne…

cc68b55

…t into espnet3/logging_utils

rename logging.py to logging_utils.py

96d8fdd

Merge branch 'master' into espnet3/logging_utils

b98b58d

Merge branch 'master' of https://github.com/espnet/espnet into espnet…

76f2352

…3/integration_test

Merge branch 'espnet3/logging_utils' of github.com:Masao-Someki/espne…

f7ba0fa

…t into espnet3/integration_test

Fixed issues

ce03215

Fixed namings of a mini-an4 recipe

1e54bee

Fixed config entries for mini-an4

7ce67dc

Fixed ci script

d843218

Fixed integration config

b9bcf14

measure -> metric

2d54bb6

[pre-commit.ci] auto fixes from pre-commit.com hooks

bba32ed

for more information, see https://pre-commit.ci

Merge branch 'espnet3/integration_test' of github.com:Masao-Someki/es…

fe7a1c4

…pnet into espnet3/recipe/ls_asr100_2

Fixed config names

8ee3156

Fixed merge mistake

a140734

added first implementation

34a0368

mergify Bot added ESPnet2 Documentation CI Travis, Circle CI, etc Installation labels Feb 12, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

ab05f8a

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed Feb 12, 2026

View reviewed changes

mergify Bot added the conflicts label Feb 17, 2026

Fhrozen added this to the v.202601 milestone Feb 22, 2026

Fhrozen modified the milestones: v.202604, v.202607 Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Started to add a diarization recipe for ESPnet3 based on DiariZen#6364

Started to add a diarization recipe for ESPnet3 based on DiariZen#6364
popcornell wants to merge 120 commits intoespnet:masterfrom
popcornell:espnet3/diarization

popcornell commented Feb 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Uh oh!

mergify Bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	python -m pip install -e '.[asr]'
	${python} -m pip install -e '.[asr]'

Conversation

popcornell commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

popcornell commented Feb 12, 2026 •

edited

Loading