Improve TrainingDatasetMetadata and get_shuffle_engine for incomplete projects#3313
Open
deruyter92 wants to merge 5 commits into
Open
Improve TrainingDatasetMetadata and get_shuffle_engine for incomplete projects#3313deruyter92 wants to merge 5 commits into
TrainingDatasetMetadata and get_shuffle_engine for incomplete projects#3313deruyter92 wants to merge 5 commits into
Conversation
This commit addresses two issues: - trainset_index can be out of bounds for TrainingFraction — that's an IndexError but surfaces as a confusing message about fractions. - The shuffle is not in the metadata — but you have no idea if that's because the metadata is empty, or because the index is simply wrong. More informative messages let you easier diagnose the problem
- Don't save empty TrainingDatasetMetadata if not found - Fallback: search in model folders for same shuffle index if metadata not found
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves robustness of training-dataset metadata handling so inference can work on “model-folder-only” / incomplete DeepLabCut projects, and provides clearer diagnostics when metadata or shuffle selection is invalid.
Changes:
- Improve
TrainingDatasetMetadata.get()error reporting (out-of-bounds trainset index, and listing known shuffles). - Avoid writing an empty
metadata.yamlwhen no shuffles can be discovered. - Update
get_shuffle_engine()to fall back to inferring engine from model folder structure when metadata is missing/incomplete.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…, before falling back to model-folder detection
C-Achard
approved these changes
May 12, 2026
Collaborator
C-Achard
left a comment
There was a problem hiding this comment.
Very nice, I like the fact that this prevents writing partial metadata.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
see #3312: currently
get_shuffle_engineis broken for projects with shuffles are not stored in metadata under training-datasets. While it can be a valid choice to make this metadata file the single source of truth, it currently breaks backwards compatibility, since for older DLC versions, only a model folder was sufficient for inference.Proposed changes:
The current PR aims to implement small changes that
get_shuffle_engineto inferring the engine from the model folder. (This fixes the currently brokenanalyze_videosfor projects that only have a model folder, but no training dataset)Note that more refactors could be implemented to guarantee long-term maintainability of this code (e.g. separate responsibilities and clear contracts - again see #3312) but the scope of the current PR is just some easy wins.
get_shuffle_enginebased on folder structure