About dataloader_num_workers in train_text_to_image_lora.py

### Describe the bug

I can run train_text_to_image_lora.py with dataloader_num_workers=0. But it does not work with dataloader_num_workers>0.

### Reproduction

I set dataloader_num_workers=4, here is the ouput.

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`                                       
        `--num_machines` was set to a value of `1`                                        
        `--dynamo_backend` was set to a value of `'no'`                                   
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
04/12/2024 10:38:20 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

{'prediction_type', 'timestep_spacing', 'rescale_betas_zero_snr', 'dynamic_thresholding_ratio', 'clip_sample_range', 'variance_type', 'thresholding', 'sample_max_value'} was not found in config. Values will be initialized to default
 values.
{'force_upcast', 'scaling_factor', 'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'only_cross_attention', 'num_attention_heads', 'encoder_hid_dim', 'dropout', 'time_cond_proj_dim', 'time_embedding_dim', 'encoder_hid_dim_type', 'attention_type', 'dual_cross_attention', 'resnet_out_scale_factor', 'projection_class
_embeddings_input_dim', 'num_class_embeds', 'cross_attention_norm', 'addition_embed_type', 'time_embedding_type', 'conv_out_kernel', 'conv_in_kernel', 'transformer_layers_per_block', 'mid_block_only_cross_attention', 'use_linear_pro
jection', 'mid_block_type', 'timestep_post_act', 'upcast_attention', 'class_embeddings_concat', 'addition_time_embed_dim', 'class_embed_type', 'resnet_skip_time_act', 'reverse_transformer_layers_per_block', 'addition_embed_type_num_
heads', 'time_embedding_act_fn', 'resnet_time_scale_shift'} was not found in config. Values will be initialized to default values.
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<?, ?it/s]
04/12/2024 10:38:24 - WARNING - datasets.builder - Found cached dataset imagefolder (C:/Users/HP/.cache/huggingface/datasets/imagefolder/default-f890b3e0a49a7f2c/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 503.46it/s] 
04/12/2024 10:38:25 - INFO - __main__ - ***** Running training *****
04/12/2024 10:38:25 - INFO - __main__ -   Num examples = 20
04/12/2024 10:38:25 - INFO - __main__ -   Num Epochs = 100
04/12/2024 10:38:25 - INFO - __main__ -   Instantaneous batch size per device = 1
04/12/2024 10:38:25 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
04/12/2024 10:38:25 - INFO - __main__ -   Gradient Accumulation steps = 4
04/12/2024 10:38:25 - INFO - __main__ -   Total optimization steps = 500
Steps:   0%|                                                                                                                                                                                                   | 0/500 [00:00<?, ?it/s]T
raceback (most recent call last):
  File "D:\work\projects\diffusers\examples\text_to_image\train_text_to_image_lora.py", line 1014, in <module>
    main()
  File "D:\work\projects\diffusers\examples\text_to_image\train_text_to_image_lora.py", line 763, in main
    for step, batch in enumerate(train_dataloader):
  File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\data_loader.py", line 449, in __iter__
    dataloader_iter = super().__iter__()
                      ^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\py312\Lib\site-packages\torch\utils\data\dataloader.py", line 439, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\py312\Lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\py312\Lib\site-packages\torch\utils\data\dataloader.py", line 1040, in __init__
    w.start()
  File "D:\anaconda3\envs\py312\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\py312\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\py312\Lib\multiprocessing\context.py", line 337, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\py312\Lib\multiprocessing\popen_spawn_win32.py", line 95, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\anaconda3\envs\py312\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.preprocess_train'
Steps:   0%|                                                                                                                                                                                                   | 0/500 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\anaconda3\envs\py312\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 46, in main
    args.func(args)
  File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\commands\launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "D:\anaconda3\envs\py312\Lib\site-packages\accelerate\commands\launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\anaconda3\\envs\\py312\\python.exe', 'train_text_to_image_lora.py', '--dataloader_num_workers=4']' returned non-zero exit status 1.

(py312) D:\work\projects\diffusers\examples\text_to_image>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\anaconda3\envs\py312\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\py312\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input

### Logs

_No response_

### System Info

- `diffusers` version: 0.28.0.dev0
- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.12.2
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- Huggingface_hub version: 0.21.4
- Transformers version: 4.39.1
- Accelerate version: 0.28.0
- xFormers version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: yes


### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About dataloader_num_workers in train_text_to_image_lora.py #7646

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

About dataloader_num_workers in train_text_to_image_lora.py #7646

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions