Skip to content

read_config() project path resolution can break path #3348

@simon-ball

Description

@simon-ball

Is there an existing issue for this?

  • I have searched the existing issues

Operating System

Windows 11

DeepLabCut version

3.0.0

What engine are you using?

tensorflow

DeepLabCut mode

single animal

Device type

RTX A4000

Bug description 🐛

When the DeepLabCut project is stored on a network drive (vs a local drive), the project config file can be automatically edited into an unusable state.

Steps To Reproduce

Reproduction

  1. Mount a remote filesystem via SMB
  2. Move or create a DLC project on that remote file system that uses the legacy Tensorflow engine
  3. Load the project configuration via, e.g., deeplabcut.analyze_videos() or deeplabcut.utils.auxiliaryfunctions.read_config()
    Result:
    FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC\\dlc-models\\iteration-0\\OPDec7-trainset95shuffle1\\test\\pose_cfg.yaml'

Let us suppose that we have mounted the remote SMB directory \\storage.domain\university\faculty\institute\group\ as Z:/

Let us further suppose that we store a variety of models under Z:/shared/DLC/, and we are trying to use the model labelled MotherDLC, which uses Tensorflow.

We can open the project config and see the contents (abbreviated!)

$ cat z:/shared/dlc/MotherDLC/config.yaml
...
project_path: Z:\shared\DLC\mini2p\MotherDLC
...

We can further verify that this path exists

>>> import pathlib
>>> p = pathlib.Path("z:/shared/dlc/MotherDLC/config.yaml")
>>> p.exists()
True

We can read this file with the auxiliary function provided by DeepLabCut:

>>> import deeplabcut.utils.auxiliaryfunctions as dlc_u_af
>>> dlc_u_af.read_config(p / "config.yaml")
{ 
'Task': 'OP', 
'date': 'Dec7', 
'project_path': '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC', 
'engine': 'tensorflow', 
...
}

Note that compared to the OS-level file read above, the value for project_path has now changed to a volume UUID instead of a Windows path. If we then attempt to use this model to analyse videos, we get the following error:

>>> deeplabcut.analyze_videos(
        config=dlc_cfg_filepath,
        videos=video_filepaths,
        shuffle=dlc_model['dlc_shuffle'],
        trainingsetindex=dlc_model['dlc_trainingsetindex'],
        dynamic=(True, 0.5, 80),
        destfolder=output_dir,
        save_as_csv=True,
    )

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)

<lots of traceback excluded for brevity, full traceback below>

FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC\\dlc-models\\iteration-0\\OPDec7-trainset95shuffle1\\test\\pose_cfg.yaml'

The above exception was the direct cause of the following exception:

<lots of traceback excluded for brevity, full traceback below>

FileNotFoundError: It seems the model for iteration 0 and shuffle 1 and trainFraction 0.95 does not exist.

Analysis

The proximal cause seems to be found in deeplabcut.utils.auxiliaryfunctions.read_config(). In any case, this issue only seems to apply with our old Tensorflow based models; a new Torch model doesn't exhibit the same problem. https://github.com/DeepLabCut/DeepLabCut/blob/main/deeplabcut/utils/auxiliaryfunctions.py#L203

def read_config(configname):
    """Reads structured config file defining a project."""
    ruamelFile = YAML()
    path = Path(configname)
    if os.path.exists(path):
        try:
            with open(path) as f:
                cfg = ruamelFile.load(f)
                curr_dir = str(Path(configname).parent.resolve())
               ...
                if cfg["project_path"] != curr_dir:
                    cfg["project_path"] = curr_dir
                    write_config(configname, cfg)
        except ...

Having been passed a valid path to a config file, the call to Path(configname).parent.resolve() replaces the Windows path with a volume UUID, and then writes that updated value back to the config file, without checking if the resolved value is meaningful. Not checking is understandable - afterall, the path had just been tested by loading the config from that exact path - but I'm unsure why the .resolve() call is needed in the first place. When DLC subsequently goes hunting for the model via that resolved path, it can't, because it's no longer a valid string for Pathlib.

I observe different behaviour between the Tensorflow and PyTorch models. I believe that this is a consequence of the exact way we invoke models on remote storage, and the different ways that the two models are implemented in DeepLabCut:

Conclusion

Ultimately, this is a bug in Pathlib, rather than DeepLabCut: due to how it interacts with Windows APIs, apparently only on Win11: pathlib can resolve a volume UUID, but it can't use a volume UUID as part of a valid path, and that's a failing in Pathlib.

I do not have a Win10 system handy to test against, but I do not recall seeing this issue with pathlib.Path.resolve() before moving to Win11.

It is not a problem on Ubuntu 24.04, which is our production environment. Whatever APIs pathlib talks to in the Linux kernel do not exhibit this pathology.

This bug was introduced between 2.3.11 and 3.0.0. In 2.3.11, the config file still underwent the weird read->overwrite process, but did not invoke .resolve(), so the overwritten file was not broken.

There are a handful of ways to address this issue. In decreasing order of preference as I see it:

  • Remove the .resolve() call. That prevents the Path object being sent into the fragile Volume UUID representation state in the first place. I have not found any explanation for why the .resolve() call was added in the first place.
  • Remove the various Path -> str castings scattered throughout deeplabcut.pose_estimation_tensorflow.predict_videos.analyze_videos. I have not investigated further whether the libraries these calls are being sent to can cope with Path objects
  • Demand that analyze_videos() receive a dictionary-based config rather than a string or path, thus forcing the end-users to address whatever their own special system weirdness is themselves.
  • Remove Windows 11 from the list of supported operating systems until Pathlib fixes its bug

Relevant log output

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\predict_videos.py:512, in analyze_videos(config, videos, video_extensions, shuffle, trainingsetindex, gputouse, save_as_csv, in_random_order, destfolder, batchsize, cropping, TFGPUinference, dynamic, modelprefix, robust_nframes, allow_growth, use_shelve, auto_track, n_tracks, animal_names, calibrate, identity_only, use_openvino)
    511 try:
--> 512     dlc_cfg = load_config(str(path_test_config))
    513 except FileNotFoundError as e:

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\config.py:68, in load_config(filename)
     67 def load_config(filename="pose_cfg.yaml"):
---> 68     return cfg_from_file(filename)

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\config.py:45, in cfg_from_file(filename)
     44 """Load a config from file filename and merge it into the default options."""
---> 45 with open(filename) as f:
     46     yaml_cfg = yaml.load(f, Loader=yaml.SafeLoader)

FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\Volume{80D426CE-0000-0000-691B-0D2800000040}\\DLC\\mini2p\\MotherDLC\\dlc-models\\iteration-0\\OPDec7-trainset95shuffle1\\test\\pose_cfg.yaml'

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 imaging.DLCPrediction.populate({"recording_name":"1f61bd412b897737"}, limit=1) # tf

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\datajoint\autopopulate.py:321, in AutoPopulate.populate(self, keys, suppress_errors, return_exception_objects, reserve_jobs, order, limit, max_calls, display_progress, processes, make_kwargs, *restrictions)
    315 if processes == 1:
    316     for key in (
    317         tqdm(keys, desc=self.__class__.__name__)
    318         if display_progress
    319         else keys
    320     ):
--> 321         status = self._populate1(key, jobs, **populate_kwargs)
    322         if status is True:
    323             success_list.append(1)

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\datajoint\autopopulate.py:399, in AutoPopulate._populate1(self, key, jobs, suppress_errors, return_exception_objects, make_kwargs)
    397 try:
    398     if not is_generator:
--> 399         make(dict(key), **(make_kwargs or {}))
    400     else:
    401         # tripartite make - transaction is delayed until the final stage
    402         gen = make(dict(key), **(make_kwargs or {}))

File ~\Documents\_work\Github\dj-imaging\imaging\jobs_s2p_dlc_sc.py:369, in DLCPrediction.make(self, key)
    368 def make(self, key):
--> 369     output_summary = self.run(key)
    370     self.insert1(output_summary, ignore_extra_fields=True)
    371     self.enter_dlc_dataset(key)

File ~\Documents\_work\Github\dj-imaging\imaging\jobs_s2p_dlc_sc.py:455, in DLCPrediction.run(cls, key, verbose, write_results_local, dlc_model)
    452     print(f"{(now - t0).total_seconds()} s: Running DLC")
    454 # ---- Trigger DLC prediction job ----
--> 455 status, stats = do_DLC_prediction(
    456     video_filepaths = locally_cached_video_files,
    457     model_name = dlc_model['dlc_model'],
    458     output_dir = local_results_directory,
    459     timestamp = timestamp,
    460 )
    461 t1 = datetime.now()
    462 if not status:

File ~\Documents\_work\Github\dj-imaging\imaging\jobs_s2p_dlc_sc.py:633, in do_DLC_prediction(video_filepaths, model_name, output_dir, timestamp)
    627     yaml_writer.dump(dlc_cfg, f)
    630 # ---- Trigger DLC prediction job ----
    631 # Update May 2022: Dynamic cropping is enabled by default with rather arbitrary
    632 # pixel margins (80)
--> 633 deeplabcut.analyze_videos(
    634     config=dlc_cfg_filepath,
    635     videos=video_filepaths,
    636     shuffle=dlc_model['dlc_shuffle'],
    637     trainingsetindex=dlc_model['dlc_trainingsetindex'],
    638     dynamic=(True, 0.5, 80),
    639     destfolder=output_dir,
    640     save_as_csv=True,
    641 )
    643 # Handle the pickle file for statistics
    644 # Testing with DLC3 generates two pickle files, `*_meta.pkl` and `*_full.pkl`. This may be a configuration issue.
    645 # We only want to read the _meta ones
    646 # In addition, the keywords inside the dictionary may be different.
    647 pickle_files = natsorted(
    648     [
    649         f for f in output_dir.glob("*.pickle")
    650         if "_meta" in f.name
    651     ]
    652 )

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\utils\deprecation.py:198, in renamed_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    196     warnings.warn(message, DLCDeprecationWarning, stacklevel=2)
    197     kwargs[new] = kwargs.pop(old)
--> 198 return fn(*args, **kwargs)

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\utils\deprecation.py:198, in renamed_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    196     warnings.warn(message, DLCDeprecationWarning, stacklevel=2)
    197     kwargs[new] = kwargs.pop(old)
--> 198 return fn(*args, **kwargs)

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\compat.py:933, in analyze_videos(config, videos, video_extensions, shuffle, trainingsetindex, gputouse, save_as_csv, in_random_order, destfolder, batch_size, cropping, TFGPUinference, dynamic, modelprefix, robust_nframes, allow_growth, use_shelve, auto_track, n_tracks, animal_names, calibrate, identity_only, use_openvino, engine, **torch_kwargs)
    930     if use_openvino is not None:  # otherwise default comes from tensorflow API
    931         kwargs["use_openvino"] = use_openvino
--> 933     return analyze_videos(
    934         config,
    935         videos,
    936         video_extensions=video_extensions,
    937         shuffle=shuffle,
    938         trainingsetindex=trainingsetindex,
    939         gputouse=gputouse,
    940         save_as_csv=save_as_csv,
    941         in_random_order=in_random_order,
    942         destfolder=destfolder,
    943         batchsize=batch_size,
    944         cropping=cropping,
    945         TFGPUinference=TFGPUinference,
    946         dynamic=dynamic,
    947         modelprefix=modelprefix,
    948         robust_nframes=robust_nframes,
    949         allow_growth=allow_growth,
    950         use_shelve=use_shelve,
    951         auto_track=auto_track,
    952         n_tracks=n_tracks,
    953         animal_names=animal_names,
    954         calibrate=calibrate,
    955         identity_only=identity_only,
    956         **kwargs,
    957     )
    958 elif engine == Engine.PYTORCH:
    959     from deeplabcut.pose_estimation_pytorch.apis import analyze_videos

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\utils\deprecation.py:198, in renamed_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    196     warnings.warn(message, DLCDeprecationWarning, stacklevel=2)
    197     kwargs[new] = kwargs.pop(old)
--> 198 return fn(*args, **kwargs)

File ~\AppData\Local\miniconda3\envs\imaging\lib\site-packages\deeplabcut\pose_estimation_tensorflow\predict_videos.py:514, in analyze_videos(config, videos, video_extensions, shuffle, trainingsetindex, gputouse, save_as_csv, in_random_order, destfolder, batchsize, cropping, TFGPUinference, dynamic, modelprefix, robust_nframes, allow_growth, use_shelve, auto_track, n_tracks, animal_names, calibrate, identity_only, use_openvino)
    512     dlc_cfg = load_config(str(path_test_config))
    513 except FileNotFoundError as e:
--> 514     raise FileNotFoundError(
    515         f"It seems the model for iteration {iteration} and shuffle "
    516         f"{shuffle} and trainFraction {trainFraction} does not exist."
    517     ) from e
    519 Snapshots = auxiliaryfunctions.get_snapshots_from_folder(
    520     train_folder=Path(modelfolder) / "train",
    521 )
    523 if cfg["snapshotindex"] == "all":

FileNotFoundError: It seems the model for iteration 0 and shuffle 1 and trainFraction 0.95 does not exist.

Anything else?

No response

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingconfigRelated to config.yaml, ruamel, YAML parsing, ...

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions