[espnet3-14] Add integration test#6331
Conversation
- This is to avoid using egs folder
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request introduces a significant amount of new code for espnet3 integration tests, including a new mini_an4 example and a TEMPLATE for recipes. The changes are extensive and set up a new structure for experiments. My review focuses on the new espnet3 components and recipes. I've found a few high-severity issues: a bug in the TEMPLATE run script, a bug in the mini_an4 dataset creation script where a configured path is ignored, and a design issue in the base inference system that hardcodes a dependency on ASR components. These issues should be addressed to ensure the new espnet3 framework is robust and extensible.
| main( | ||
| args=args, | ||
| system_cls=ASRSystem, | ||
| stages=DEFAULT_STAGES, |
There was a problem hiding this comment.
The stages_to_run variable, which holds the resolved list of stages to execute, is calculated but then ignored. Instead, DEFAULT_STAGES is passed to the main function. This is a bug in the template that can lead to incorrect behavior if a recipe defines custom stages. The resolved stages_to_run should be passed to main.
| stages=DEFAULT_STAGES, | |
| stages=stages_to_run, |
| f.write(f"{e.utt_id}\t{e.wav_path}\t{e.text}\n") | ||
|
|
||
|
|
||
| def create_dataset(dataset_dir: Path, *, archive_path: Path | None = None) -> None: |
There was a problem hiding this comment.
The archive_path parameter is ignored. The function uses a hardcoded path to the archive, which makes it less flexible and ignores the configuration provided in train.yaml. The archive_path passed to the function should be used.
| def create_dataset(dataset_dir: Path, *, archive_path: Path | None = None) -> None: | |
| archive = Path(archive_path) if archive_path is not None else Path("../../../egs2/mini_an4/asr1/downloads.tar.gz") |
There was a problem hiding this comment.
Please reflect on this comment.
| from omegaconf import DictConfig | ||
|
|
||
| from espnet3.parallel.parallel import set_parallel | ||
| from espnet3.systems.asr.inference import InferenceProvider, InferenceRunner |
There was a problem hiding this comment.
The inference function in this base module has a hardcoded dependency on the ASR-specific InferenceProvider and InferenceRunner. This breaks the abstraction of a base system and prevents it from being used with other tasks (e.g., TTS, SLU) without modification. The provider and runner classes should be made configurable, for example by instantiating them from the configuration file.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6331 +/- ##
==========================================
+ Coverage 70.02% 70.05% +0.03%
==========================================
Files 787 788 +1
Lines 73075 73430 +355
==========================================
+ Hits 51171 51444 +273
- Misses 21904 21986 +82
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…t into espnet3/integration_test
…pnet into espnet3/integration_test
…t into espnet3/integration_test
- Previously we asked developer to create a user-defined modle, but I supported as a default. - Userd can set `val_scheduler_criterion` as espnet2 to use this function.
- supported train/valid switching for preprocessor - Add new default resolver to load external config file
for more information, see https://pre-commit.ci
…pnet into espnet3/integration_test
|
I can review this PR. Thanks @Masao-Someki. |
…pnet into espnet3/integration_test
…asure_config -> metrics_config
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _resolve_test_sets(measure_config: DictConfig) -> list[str]: |
There was a problem hiding this comment.
This function is to allow users to drop dataset config from metrics.yaml.
| _LOG_STAGE.reset(token) | ||
|
|
||
|
|
||
| def log_stage_metadata( |
There was a problem hiding this comment.
This function is moved from run.py for refactoring.
| return [s for s in stages if s in requested_set] | ||
|
|
||
|
|
||
| def parse_cli_and_stage_args( |
There was a problem hiding this comment.
This function is moved from run.py for refactoring.
| @@ -0,0 +1,80 @@ | |||
| from pathlib import Path | |||
There was a problem hiding this comment.
This file tests functions in the run.py, and not running the integration test.
for more information, see https://pre-commit.ci
…pnet into espnet3/integration_test
|
I think this PR becomes too large. I will split this into 2 PRs. |
|
We want to keep the history of the interactions here. |
|
@sw005320 Thank you. I will keep this PR as it is and create a new PR with the limited changes! |
|
This pull request is now in conflict :( |
What did you change?
generation code.
Why did you make this change?
Is your PR small enough?
no
(more than 20 files changed, because it includes TEMPLATE directory.)
Additional Context
This PR should be merged after #6329.