Fix matlab file not dropping likelihood column if present from machine labels by C-Achard · Pull Request #3323 · DeepLabCut/DeepLabCut

C-Achard · 2026-05-11T08:49:42Z

Issue

The latest napari-deeplabcut version retains the likelihood column when refining machine annotations (they are added to the CollectedData h5 if present).
The matlab file creation function always assumed only x and y are present, and did not filter out likelihood, leading to dataset creation failure.
Note: this raises in terminal but could cause the GUI to hang instead, as reported in #3319.

Should napari-deeplabcut always discard likelihood when saving to CollectedData ? -> Yes
- Fixed plugin-side in Prevent likelihood columns from being saved in CollectedData napari-deeplabcut#204
Do we want to systematically filter at the training set h5/csv creation step ? -> not needed due to above point
Aims to close Shuffle creation failure #3319

Fix

Improves the robustness of the training data formatting by ensuring that any "likelihood" columns present in the input DataFrame are dropped before .mat formatting, and adds a corresponding test to verify this behavior.

Data formatting fixes:

Updated format_training_data in trainingsetmanipulation.py to automatically detect and remove "likelihood" columns from the DataFrame before processing, ensuring only "x" and "y" coordinates are used. Added validation to require both "x" and "y" columns and to check that the number of coordinate values per row is even.
Improved error handling for malformed data, raising clear exceptions when required coordinate columns are missing or when the data shape is unexpected.

Testing:

Added a new test test_format_training_data_ignores_likelihood_columns in test_trainingsetmanipulation.py to verify that the presence of "likelihood" columns does not affect the output of format_training_data.

Fix format_training_data to handle MultiIndex columns with likelihoods: detect coord level, keep only x/y columns (logging when likelihoods are dropped), and raise if x/y are missing. Use the row values to ensure an even number of coordinates after dropping non-coord columns (error if odd), reshape into (N,2), filter NaNs, and clip out-of-image joints. Also skip images without labels.

Add a unit test to tests/test_trainingsetmanipulation.py that verifies format_training_data ignores 'likelihood' columns when formatting training data. The test monkeypatches read_image_shape_fast, constructs a DataFrame with inserted likelihood columns after each y coordinate, and compares the formatted outputs (image, size, joints) against a baseline produced from the original x/y-only DataFrame to ensure identical results.

C-Achard · 2026-05-11T12:01:56Z

Note: after discussion with @deruyter92, it has been decided not to retain likelihood columns in CollectedData, as previously. See DeepLabCut/napari-deeplabcut#204.

deruyter92 · 2026-05-12T09:37:33Z

Should napari-deeplabcut always discard likelihood when saving to CollectedData?

For future reference we concluded yes for the following reasons:

Anything reviewed by human-labeling should be considered ground-truth labels
Keeping likelihood in the CollectedData leaves the wrong impression that the data is uncertain or unreliable, while these scores are only applicable to the NN-based judgements, not the human labels.
Since the likelihood scores for machine-labels are also stored separately, there is no need to also keep them here, and we can safely remove these from the CollectedData.

C-Achard added 2 commits May 11, 2026 10:39

C-Achard requested a review from deruyter92 May 11, 2026 08:49

C-Achard self-assigned this May 11, 2026

C-Achard added the bug fix! fix for a real buggy one... label May 11, 2026

C-Achard mentioned this pull request May 11, 2026

Prevent likelihood columns from being saved in CollectedData DeepLabCut/napari-deeplabcut#204

Open

deruyter92 approved these changes May 11, 2026

View reviewed changes

Merge branch 'main' into cy/fix-matlab-column-num

c733347

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix matlab file not dropping likelihood column if present from machine labels #3323

Fix matlab file not dropping likelihood column if present from machine labels #3323
C-Achard wants to merge 3 commits into
mainfrom
cy/fix-matlab-column-num

C-Achard commented May 11, 2026 •

edited

Loading

Uh oh!

C-Achard commented May 11, 2026

Uh oh!

deruyter92 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

C-Achard commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Related

Fix

Uh oh!

C-Achard commented May 11, 2026

Uh oh!

deruyter92 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

C-Achard commented May 11, 2026 •

edited

Loading