Skip to content

About dataloader_num_workers in train_text_to_image_lora.py #7646

@Hellcat1005

Description

@Hellcat1005

Describe the bug

I can run train_text_to_image_lora.py with dataloader_num_workers=0. But it does not work with dataloader_num_workers>0.

Reproduction

I set dataloader_num_workers=4, here is the ouput.

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
04/12/2024 10:38:20 - INFO - main - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

{'prediction_type', 'timestep_spacing', 'rescale_betas_zero_snr', 'dynamic_thresholding_ratio', 'clip_sample_range', 'variance_type', 'thresholding', 'sample_max_value'} was not found in config. Values will be initialized to default
values.
{'force_upcast', 'scaling_factor', 'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'only_cross_attention', 'num_attention_heads', 'encoder_hid_dim', 'dropout', 'time_cond_proj_dim', 'time_embedding_dim', 'encoder_hid_dim_type', 'attention_type', 'dual_cross_attention', 'resnet_out_scale_factor', 'projection_class
embeddings_input_dim', 'num_class_embeds', 'cross_attention_norm', 'addition_embed_type', 'time_embedding_type', 'conv_out_kernel', 'conv_in_kernel', 'transformer_layers_per_block', 'mid_block_only_cross_attention', 'use_linear_pro
jection', 'mid_block_type', 'timestep_post_act', 'upcast_attention', 'class_embeddings_concat', 'addition_time_embed_dim', 'class_embed_type', 'resnet_skip_time_act', 'reverse_transformer_layers_per_block', 'addition_embed_type_num

heads', 'time_embedding_act_fn', 'resnet_time_scale_shift'} was not found in config. Values will be initialized to default values.
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<?, ?it/s]
04/12/2024 10:38:24 - WARNING - datasets.builder - Found cached dataset imagefolder (C:/Users/HP/.cache/huggingface/datasets/imagefolder/default-f890b3e0a49a7f2c/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 503.46it/s]
04/12/2024 10:38:25 - INFO - main - ***** Running training *****
04/12/2024 10:38:25 - INFO - main - Num examples = 20
04/12/2024 10:38:25 - INFO - main - Num Epochs = 100
04/12/2024 10:38:25 - INFO - main - Instantaneous batch size per device = 1
04/12/2024 10:38:25 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 4
04/12/2024 10:38:25 - INFO - main - Gradient Accumulation steps = 4
04/12/2024 10:38:25 - INFO - main - Total optimization steps = 500
Steps: 0%| | 0/500 [00:00<?, ?it/s]T
raceback (most recent call last):
File "D:\work\projects\diffusers\examples\text_to_image\train_text_to_image_lora.py", line 1014, in
main()
File "D:\work\projects\diffusers\examples\text_to_image\train_text_to_image_lora.py", line 763, in main
for step, batch in enumerate(train_dataloader):
File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\data_loader.py", line 449, in iter
dataloader_iter = super().iter()
^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\py312\Lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter
return self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\py312\Lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\py312\Lib\site-packages\torch\utils\data\dataloader.py", line 1040, in init
w.start()
File "D:\anaconda3\envs\py312\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\py312\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\py312\Lib\multiprocessing\context.py", line 337, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\py312\Lib\multiprocessing\popen_spawn_win32.py", line 95, in init
reduction.dump(process_obj, to_child)
File "D:\anaconda3\envs\py312\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main..preprocess_train'
Steps: 0%| | 0/500 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "D:\anaconda3\envs\py312\Scripts\accelerate.exe_main
.py", line 7, in
File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 46, in main
args.func(args)
File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\commands\launch.py", line 1057, in launch_command
simple_launcher(args)
File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\commands\launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\anaconda3\envs\py312\python.exe', 'train_text_to_image_lora.py', '--dataloader_num_workers=4']' returned non-zero exit status 1.

(py312) D:\work\projects\diffusers\examples\text_to_image>Traceback (most recent call last):
File "", line 1, in
File "D:\anaconda3\envs\py312\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\py312\Lib\multiprocessing\spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input

Logs

No response

System Info

  • diffusers version: 0.28.0.dev0
  • Platform: Windows-10-10.0.19045-SP0
  • Python version: 3.12.2
  • PyTorch version (GPU?): 2.2.1+cu121 (True)
  • Huggingface_hub version: 0.21.4
  • Transformers version: 4.39.1
  • Accelerate version: 0.28.0
  • xFormers version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: yes

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions