Improve `TrainingDatasetMetadata` and `get_shuffle_engine` for incomplete projects by deruyter92 · Pull Request #3313 · DeepLabCut/DeepLabCut

deruyter92 · 2026-05-06T16:47:25Z

Motivation
see #3312: currently get_shuffle_engine is broken for projects with shuffles are not stored in metadata under training-datasets. While it can be a valid choice to make this metadata file the single source of truth, it currently breaks backwards compatibility, since for older DLC versions, only a model folder was sufficient for inference.

Proposed changes:
The current PR aims to implement small changes that

allow fallback for get_shuffle_engine to inferring the engine from the model folder. (This fixes the currently broken analyze_videos for projects that only have a model folder, but no training dataset)
improve error messages, so users can diagnose if their project is incomplete / if they just specified a non-existent shuffle.
don't write an empty metadata file when the metadata was not found in the first place (this made the state of the project even more corrupt than it was)

Note that more refactors could be implemented to guarantee long-term maintainability of this code (e.g. separate responsibilities and clear contracts - again see #3312) but the scope of the current PR is just some easy wins.

improve error messages in TrainingDatasetMetadata
implement fallback for get_shuffle_engine based on folder structure
add tests

This commit addresses two issues: - trainset_index can be out of bounds for TrainingFraction — that's an IndexError but surfaces as a confusing message about fractions. - The shuffle is not in the metadata — but you have no idea if that's because the metadata is empty, or because the index is simply wrong. More informative messages let you easier diagnose the problem

- Don't save empty TrainingDatasetMetadata if not found - Fallback: search in model folders for same shuffle index if metadata not found

Copilot

Pull request overview

This PR improves robustness of training-dataset metadata handling so inference can work on “model-folder-only” / incomplete DeepLabCut projects, and provides clearer diagnostics when metadata or shuffle selection is invalid.

Changes:

Improve TrainingDatasetMetadata.get() error reporting (out-of-bounds trainset index, and listing known shuffles).
Avoid writing an empty metadata.yaml when no shuffles can be discovered.
Update get_shuffle_engine() to fall back to inferring engine from model folder structure when metadata is missing/incomplete.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…, before falling back to model-folder detection

C-Achard

Very nice, I like the fact that this prevents writing partial metadata.

deruyter92 added 2 commits May 6, 2026 18:02

update get_shuffle_engine

d26fe37

- Don't save empty TrainingDatasetMetadata if not found - Fallback: search in model folders for same shuffle index if metadata not found

deruyter92 mentioned this pull request May 11, 2026

Docs audit [April 2026] Small updates to demo notebooks #3324

Open

C-Achard added pytorch config Related to config.yaml, ruamel, YAML parsing, ... labels May 11, 2026

C-Achard assigned deruyter92 May 11, 2026

deruyter92 requested a review from Copilot May 12, 2026 06:53

Copilot started reviewing on behalf of deruyter92 May 12, 2026 06:54 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

deruyter92 added 3 commits May 12, 2026 11:50

Improve logging: surface a warning when shuffle not found in metadata…

9986552

…, before falling back to model-folder detection

fix item retrieval from set of engines

4b6c95c

add tests for TrainsetMetadata

16987eb

deruyter92 marked this pull request as ready for review May 12, 2026 10:37

deruyter92 requested a review from C-Achard May 12, 2026 10:37

C-Achard approved these changes May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve `TrainingDatasetMetadata` and `get_shuffle_engine` for incomplete projects#3313

Improve `TrainingDatasetMetadata` and `get_shuffle_engine` for incomplete projects#3313
deruyter92 wants to merge 5 commits into
mainfrom
jaap/improve_training_dataset_metadata

deruyter92 commented May 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

C-Achard left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

deruyter92 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

C-Achard left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deruyter92 commented May 6, 2026 •

edited

Loading