Fix matlab file not dropping likelihood column if present from machine labels #3323
Open
C-Achard wants to merge 3 commits into
Open
Fix matlab file not dropping likelihood column if present from machine labels #3323C-Achard wants to merge 3 commits into
C-Achard wants to merge 3 commits into
Conversation
Fix format_training_data to handle MultiIndex columns with likelihoods: detect coord level, keep only x/y columns (logging when likelihoods are dropped), and raise if x/y are missing. Use the row values to ensure an even number of coordinates after dropping non-coord columns (error if odd), reshape into (N,2), filter NaNs, and clip out-of-image joints. Also skip images without labels.
Add a unit test to tests/test_trainingsetmanipulation.py that verifies format_training_data ignores 'likelihood' columns when formatting training data. The test monkeypatches read_image_shape_fast, constructs a DataFrame with inserted likelihood columns after each y coordinate, and compares the formatted outputs (image, size, joints) against a baseline produced from the original x/y-only DataFrame to ensure identical results.
Collaborator
Author
|
Note: after discussion with @deruyter92, it has been decided not to retain likelihood columns in CollectedData, as previously. See DeepLabCut/napari-deeplabcut#204. |
deruyter92
approved these changes
May 11, 2026
Collaborator
For future reference we concluded yes for the following reasons:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
The latest napari-deeplabcut version retains the likelihood column when refining machine annotations (they are added to the CollectedData h5 if present).
The matlab file creation function always assumed only x and y are present, and did not filter out likelihood, leading to dataset creation failure.
Note: this raises in terminal but could cause the GUI to hang instead, as reported in #3319.
Related
Fix
Improves the robustness of the training data formatting by ensuring that any "likelihood" columns present in the input DataFrame are dropped before .mat formatting, and adds a corresponding test to verify this behavior.
Data formatting fixes:
format_training_dataintrainingsetmanipulation.pyto automatically detect and remove "likelihood" columns from the DataFrame before processing, ensuring only "x" and "y" coordinates are used. Added validation to require both "x" and "y" columns and to check that the number of coordinate values per row is even.Testing:
test_format_training_data_ignores_likelihood_columnsintest_trainingsetmanipulation.pyto verify that the presence of "likelihood" columns does not affect the output offormat_training_data.