Is there an existing issue for this?
Operating System
Windows 11
DeepLabCut version
3.0.0
What engine are you using?
tensorflow
DeepLabCut mode
single animal
Device type
RTX A4000
Bug description 🐛
When the DeepLabCut project is stored on a network drive (vs a local drive), the project config file can be automatically edited into an unusable state.
Steps To Reproduce
Reproduction
- Mount a remote filesystem via SMB
- Move or create a DLC project on that remote file system that uses the legacy Tensorflow engine
- Load the project configuration via, e.g.,
deeplabcut.analyze_videos() or deeplabcut.utils.auxiliaryfunctions.read_config()
Result:
FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC\\dlc-models\\iteration-0\\OPDec7-trainset95shuffle1\\test\\pose_cfg.yaml'
Let us suppose that we have mounted the remote SMB directory \\storage.domain\university\faculty\institute\group\ as Z:/
Let us further suppose that we store a variety of models under Z:/shared/DLC/, and we are trying to use the model labelled MotherDLC, which uses Tensorflow.
We can open the project config and see the contents (abbreviated!)
$ cat z:/shared/dlc/MotherDLC/config.yaml
...
project_path: Z:\shared\DLC\mini2p\MotherDLC
...
We can further verify that this path exists
>>> import pathlib
>>> p = pathlib.Path("z:/shared/dlc/MotherDLC/config.yaml")
>>> p.exists()
True
We can read this file with the auxiliary function provided by DeepLabCut:
>>> import deeplabcut.utils.auxiliaryfunctions as dlc_u_af
>>> dlc_u_af.read_config(p / "config.yaml")
{
'Task': 'OP',
'date': 'Dec7',
'project_path': '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC',
'engine': 'tensorflow',
...
}
Note that compared to the OS-level file read above, the value for project_path has now changed to a volume UUID instead of a Windows path. If we then attempt to use this model to analyse videos, we get the following error:
>>> deeplabcut.analyze_videos(
config=dlc_cfg_filepath,
videos=video_filepaths,
shuffle=dlc_model['dlc_shuffle'],
trainingsetindex=dlc_model['dlc_trainingsetindex'],
dynamic=(True, 0.5, 80),
destfolder=output_dir,
save_as_csv=True,
)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<lots of traceback excluded for brevity, full traceback below>
FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC\\dlc-models\\iteration-0\\OPDec7-trainset95shuffle1\\test\\pose_cfg.yaml'
The above exception was the direct cause of the following exception:
<lots of traceback excluded for brevity, full traceback below>
FileNotFoundError: It seems the model for iteration 0 and shuffle 1 and trainFraction 0.95 does not exist.
Analysis
The proximal cause seems to be found in deeplabcut.utils.auxiliaryfunctions.read_config(). In any case, this issue only seems to apply with our old Tensorflow based models; a new Torch model doesn't exhibit the same problem. https://github.com/DeepLabCut/DeepLabCut/blob/main/deeplabcut/utils/auxiliaryfunctions.py#L203
def read_config(configname):
"""Reads structured config file defining a project."""
ruamelFile = YAML()
path = Path(configname)
if os.path.exists(path):
try:
with open(path) as f:
cfg = ruamelFile.load(f)
curr_dir = str(Path(configname).parent.resolve())
...
if cfg["project_path"] != curr_dir:
cfg["project_path"] = curr_dir
write_config(configname, cfg)
except ...
Having been passed a valid path to a config file, the call to Path(configname).parent.resolve() replaces the Windows path with a volume UUID, and then writes that updated value back to the config file, without checking if the resolved value is meaningful. Not checking is understandable - afterall, the path had just been tested by loading the config from that exact path - but I'm unsure why the .resolve() call is needed in the first place. When DLC subsequently goes hunting for the model via that resolved path, it can't, because it's no longer a valid string for Pathlib.
I observe different behaviour between the Tensorflow and PyTorch models. I believe that this is a consequence of the exact way we invoke models on remote storage, and the different ways that the two models are implemented in DeepLabCut:
- we store a "config template" for a model, and generate a config file on the fly, including the full, valid, project path for however the remote filesystem containing the DLC models happens to be mounted on that host
- The config file is regenerated each time a batch of videos are processed. So each run, the config file starts correct.
- When
auxiliaryfunctions.read_config() is run, the file-on-disk is broken, while the Path object in memory is functional, but fragile. The project_path value goes into a weird state where it's a valid, functional Path object that really points to a real location on disk but its representation has taken the Volume UUID form. It is no longer able to be idempotently cast between str and Path representations.
- For Tensorflow:
- For PyTorch:
Conclusion
Ultimately, this is a bug in Pathlib, rather than DeepLabCut: due to how it interacts with Windows APIs, apparently only on Win11: pathlib can resolve a volume UUID, but it can't use a volume UUID as part of a valid path, and that's a failing in Pathlib.
I do not have a Win10 system handy to test against, but I do not recall seeing this issue with pathlib.Path.resolve() before moving to Win11.
It is not a problem on Ubuntu 24.04, which is our production environment. Whatever APIs pathlib talks to in the Linux kernel do not exhibit this pathology.
This bug was introduced between 2.3.11 and 3.0.0. In 2.3.11, the config file still underwent the weird read->overwrite process, but did not invoke .resolve(), so the overwritten file was not broken.
There are a handful of ways to address this issue. In decreasing order of preference as I see it:
- Remove the
.resolve() call. That prevents the Path object being sent into the fragile Volume UUID representation state in the first place. I have not found any explanation for why the .resolve() call was added in the first place.
- Remove the various Path -> str castings scattered throughout
deeplabcut.pose_estimation_tensorflow.predict_videos.analyze_videos. I have not investigated further whether the libraries these calls are being sent to can cope with Path objects
- Demand that
analyze_videos() receive a dictionary-based config rather than a string or path, thus forcing the end-users to address whatever their own special system weirdness is themselves.
- Remove Windows 11 from the list of supported operating systems until Pathlib fixes its bug
Relevant log output
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\predict_videos.py:512, in analyze_videos(config, videos, video_extensions, shuffle, trainingsetindex, gputouse, save_as_csv, in_random_order, destfolder, batchsize, cropping, TFGPUinference, dynamic, modelprefix, robust_nframes, allow_growth, use_shelve, auto_track, n_tracks, animal_names, calibrate, identity_only, use_openvino)
511 try:
--> 512 dlc_cfg = load_config(str(path_test_config))
513 except FileNotFoundError as e:
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\config.py:68, in load_config(filename)
67 def load_config(filename="pose_cfg.yaml"):
---> 68 return cfg_from_file(filename)
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\config.py:45, in cfg_from_file(filename)
44 """Load a config from file filename and merge it into the default options."""
---> 45 with open(filename) as f:
46 yaml_cfg = yaml.load(f, Loader=yaml.SafeLoader)
FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC\\dlc-models\\iteration-0\\OPDec7-trainset95shuffle1\\test\\pose_cfg.yaml'
The above exception was the direct cause of the following exception:
FileNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 imaging.DLCPrediction.populate({"recording_name":"1f61bd412b897737"}, limit=1) # tf
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\datajoint\autopopulate.py:321, in AutoPopulate.populate(self, keys, suppress_errors, return_exception_objects, reserve_jobs, order, limit, max_calls, display_progress, processes, make_kwargs, *restrictions)
315 if processes == 1:
316 for key in (
317 tqdm(keys, desc=self.__class__.__name__)
318 if display_progress
319 else keys
320 ):
--> 321 status = self._populate1(key, jobs, **populate_kwargs)
322 if status is True:
323 success_list.append(1)
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\datajoint\autopopulate.py:399, in AutoPopulate._populate1(self, key, jobs, suppress_errors, return_exception_objects, make_kwargs)
397 try:
398 if not is_generator:
--> 399 make(dict(key), **(make_kwargs or {}))
400 else:
401 # tripartite make - transaction is delayed until the final stage
402 gen = make(dict(key), **(make_kwargs or {}))
File ~\Documents\_work\Github\dj-imaging\imaging\jobs_s2p_dlc_sc.py:369, in DLCPrediction.make(self, key)
368 def make(self, key):
--> 369 output_summary = self.run(key)
370 self.insert1(output_summary, ignore_extra_fields=True)
371 self.enter_dlc_dataset(key)
File ~\Documents\_work\Github\dj-imaging\imaging\jobs_s2p_dlc_sc.py:455, in DLCPrediction.run(cls, key, verbose, write_results_local, dlc_model)
452 print(f"{(now - t0).total_seconds()} s: Running DLC")
454 # ---- Trigger DLC prediction job ----
--> 455 status, stats = do_DLC_prediction(
456 video_filepaths = locally_cached_video_files,
457 model_name = dlc_model['dlc_model'],
458 output_dir = local_results_directory,
459 timestamp = timestamp,
460 )
461 t1 = datetime.now()
462 if not status:
File ~\Documents\_work\Github\dj-imaging\imaging\jobs_s2p_dlc_sc.py:633, in do_DLC_prediction(video_filepaths, model_name, output_dir, timestamp)
627 yaml_writer.dump(dlc_cfg, f)
630 # ---- Trigger DLC prediction job ----
631 # Update May 2022: Dynamic cropping is enabled by default with rather arbitrary
632 # pixel margins (80)
--> 633 deeplabcut.analyze_videos(
634 config=dlc_cfg_filepath,
635 videos=video_filepaths,
636 shuffle=dlc_model['dlc_shuffle'],
637 trainingsetindex=dlc_model['dlc_trainingsetindex'],
638 dynamic=(True, 0.5, 80),
639 destfolder=output_dir,
640 save_as_csv=True,
641 )
643 # Handle the pickle file for statistics
644 # Testing with DLC3 generates two pickle files, `*_meta.pkl` and `*_full.pkl`. This may be a configuration issue.
645 # We only want to read the _meta ones
646 # In addition, the keywords inside the dictionary may be different.
647 pickle_files = natsorted(
648 [
649 f for f in output_dir.glob("*.pickle")
650 if "_meta" in f.name
651 ]
652 )
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\utils\deprecation.py:198, in renamed_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
196 warnings.warn(message, DLCDeprecationWarning, stacklevel=2)
197 kwargs[new] = kwargs.pop(old)
--> 198 return fn(*args, **kwargs)
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\utils\deprecation.py:198, in renamed_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
196 warnings.warn(message, DLCDeprecationWarning, stacklevel=2)
197 kwargs[new] = kwargs.pop(old)
--> 198 return fn(*args, **kwargs)
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\compat.py:933, in analyze_videos(config, videos, video_extensions, shuffle, trainingsetindex, gputouse, save_as_csv, in_random_order, destfolder, batch_size, cropping, TFGPUinference, dynamic, modelprefix, robust_nframes, allow_growth, use_shelve, auto_track, n_tracks, animal_names, calibrate, identity_only, use_openvino, engine, **torch_kwargs)
930 if use_openvino is not None: # otherwise default comes from tensorflow API
931 kwargs["use_openvino"] = use_openvino
--> 933 return analyze_videos(
934 config,
935 videos,
936 video_extensions=video_extensions,
937 shuffle=shuffle,
938 trainingsetindex=trainingsetindex,
939 gputouse=gputouse,
940 save_as_csv=save_as_csv,
941 in_random_order=in_random_order,
942 destfolder=destfolder,
943 batchsize=batch_size,
944 cropping=cropping,
945 TFGPUinference=TFGPUinference,
946 dynamic=dynamic,
947 modelprefix=modelprefix,
948 robust_nframes=robust_nframes,
949 allow_growth=allow_growth,
950 use_shelve=use_shelve,
951 auto_track=auto_track,
952 n_tracks=n_tracks,
953 animal_names=animal_names,
954 calibrate=calibrate,
955 identity_only=identity_only,
956 **kwargs,
957 )
958 elif engine == Engine.PYTORCH:
959 from deeplabcut.pose_estimation_pytorch.apis import analyze_videos
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\utils\deprecation.py:198, in renamed_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
196 warnings.warn(message, DLCDeprecationWarning, stacklevel=2)
197 kwargs[new] = kwargs.pop(old)
--> 198 return fn(*args, **kwargs)
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\predict_videos.py:514, in analyze_videos(config, videos, video_extensions, shuffle, trainingsetindex, gputouse, save_as_csv, in_random_order, destfolder, batchsize, cropping, TFGPUinference, dynamic, modelprefix, robust_nframes, allow_growth, use_shelve, auto_track, n_tracks, animal_names, calibrate, identity_only, use_openvino)
512 dlc_cfg = load_config(str(path_test_config))
513 except FileNotFoundError as e:
--> 514 raise FileNotFoundError(
515 f"It seems the model for iteration {iteration} and shuffle "
516 f"{shuffle} and trainFraction {trainFraction} does not exist."
517 ) from e
519 Snapshots = auxiliaryfunctions.get_snapshots_from_folder(
520 train_folder=Path(modelfolder) / "train",
521 )
523 if cfg["snapshotindex"] == "all":
FileNotFoundError: It seems the model for iteration 0 and shuffle 1 and trainFraction 0.95 does not exist.
Anything else?
No response
Code of Conduct
Is there an existing issue for this?
Operating System
Windows 11
DeepLabCut version
3.0.0
What engine are you using?
tensorflow
DeepLabCut mode
single animal
Device type
RTX A4000
Bug description 🐛
When the DeepLabCut project is stored on a network drive (vs a local drive), the project config file can be automatically edited into an unusable state.
Steps To Reproduce
Reproduction
deeplabcut.analyze_videos()ordeeplabcut.utils.auxiliaryfunctions.read_config()Result:
FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC\\dlc-models\\iteration-0\\OPDec7-trainset95shuffle1\\test\\pose_cfg.yaml'Let us suppose that we have mounted the remote SMB directory
\\storage.domain\university\faculty\institute\group\asZ:/Let us further suppose that we store a variety of models under
Z:/shared/DLC/, and we are trying to use the model labelledMotherDLC, which uses Tensorflow.We can open the project config and see the contents (abbreviated!)
We can further verify that this path exists
We can read this file with the auxiliary function provided by DeepLabCut:
Note that compared to the OS-level file read above, the value for
project_pathhas now changed to a volume UUID instead of a Windows path. If we then attempt to use this model to analyse videos, we get the following error:Analysis
The proximal cause seems to be found in
deeplabcut.utils.auxiliaryfunctions.read_config(). In any case, this issue only seems to apply with our old Tensorflow based models; a new Torch model doesn't exhibit the same problem. https://github.com/DeepLabCut/DeepLabCut/blob/main/deeplabcut/utils/auxiliaryfunctions.py#L203Having been passed a valid path to a config file, the call to
Path(configname).parent.resolve()replaces the Windows path with a volume UUID, and then writes that updated value back to the config file, without checking if the resolved value is meaningful. Not checking is understandable - afterall, the path had just been tested by loading the config from that exact path - but I'm unsure why the.resolve()call is needed in the first place. When DLC subsequently goes hunting for the model via that resolved path, it can't, because it's no longer a valid string for Pathlib.I observe different behaviour between the Tensorflow and PyTorch models. I believe that this is a consequence of the exact way we invoke models on remote storage, and the different ways that the two models are implemented in DeepLabCut:
auxiliaryfunctions.read_config()is run, the file-on-disk is broken, while the Path object in memory is functional, but fragile. Theproject_pathvalue goes into a weird state where it's a valid, functional Path object that really points to a real location on disk but its representation has taken the Volume UUID form. It is no longer able to be idempotently cast between str and Path representations.Conclusion
Ultimately, this is a bug in Pathlib, rather than DeepLabCut: due to how it interacts with Windows APIs, apparently only on Win11: pathlib can resolve a volume UUID, but it can't use a volume UUID as part of a valid path, and that's a failing in Pathlib.
I do not have a Win10 system handy to test against, but I do not recall seeing this issue with
pathlib.Path.resolve()before moving to Win11.It is not a problem on Ubuntu 24.04, which is our production environment. Whatever APIs pathlib talks to in the Linux kernel do not exhibit this pathology.
This bug was introduced between 2.3.11 and 3.0.0. In 2.3.11, the config file still underwent the weird read->overwrite process, but did not invoke
.resolve(), so the overwritten file was not broken.There are a handful of ways to address this issue. In decreasing order of preference as I see it:
.resolve()call. That prevents the Path object being sent into the fragile Volume UUID representation state in the first place. I have not found any explanation for why the.resolve()call was added in the first place.deeplabcut.pose_estimation_tensorflow.predict_videos.analyze_videos. I have not investigated further whether the libraries these calls are being sent to can cope with Path objectsanalyze_videos()receive a dictionary-based config rather than a string or path, thus forcing the end-users to address whatever their own special system weirdness is themselves.Relevant log output
Anything else?
No response
Code of Conduct