Skip to content

DDP recipe tests and fixes#2130

Merged
mravanelli merged 6 commits intodevelopfrom
fixDDP
Sep 21, 2023
Merged

DDP recipe tests and fixes#2130
mravanelli merged 6 commits intodevelopfrom
fixDDP

Conversation

@mravanelli
Copy link
Copy Markdown
Collaborator

@mravanelli mravanelli commented Aug 14, 2023

Ideally, we want our recipes to work with different devices. Ideally, they should work with:

  • Single GPU (cuda:0)
  • Single GPU (cuda:1) => This is to test if there are some recipes with cuda:0 hard-coded

Future work (for other PRs)

  • Single CPU
  • Multiple GPUs on the same node
  • Multiple CPUs on the same node
  • Multiple GPUs on the different nodes
  • Multiple CPUs on the different nodes

The goal of this PR is allow recipe tests to run on multiple GPUs and fix the issues that cause tests to fail.

@mravanelli mravanelli added enhancement New feature or request work in progress Not ready for merge labels Aug 14, 2023
@mravanelli mravanelli requested a review from Adel-Moumen August 14, 2023 20:13
@mravanelli mravanelli self-assigned this Aug 14, 2023
@mravanelli mravanelli changed the title [WIP] DDP recipe tests and fixes DDP recipe tests and fixes Sep 21, 2023
@mravanelli
Copy link
Copy Markdown
Collaborator Author

I limited this PR to fix the issues emerging when using a not default cuda device (e.g, cuda:1). I will follow up in the future with other PRs to extend tests on multi-GPU training.

@mravanelli mravanelli merged commit e1a3107 into develop Sep 21, 2023
@mravanelli mravanelli deleted the fixDDP branch September 21, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request work in progress Not ready for merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant