Skip to content

Change DS-Chat script flags for deployment type#291

Merged
tjruwase merged 4 commits into
masterfrom
mrwyattii/update-chat-script
Apr 13, 2023
Merged

Change DS-Chat script flags for deployment type#291
tjruwase merged 4 commits into
masterfrom
mrwyattii/update-chat-script

Conversation

@mrwyattii
Copy link
Copy Markdown
Contributor

@mrwyattii mrwyattii commented Apr 13, 2023

Currently the deployment type (i.e., single gpu, single node, and multi-node) are controlled by the --num-gpus flag with the options restricted to 1,8,64. However, this is misleading as the number of GPUs on a user's node may not be 8 and they may not be running on 8 nodes.

Resolving this confusion by changing the flag to deployment-type with the options single_gpu single_node and multi_node.

resolves #288

Need to merge deepspeedai/DeepSpeed#3219 after this is merged

Comment thread applications/DeepSpeed-Chat/train.py
@tjruwase tjruwase merged commit d570b2c into master Apr 13, 2023
hwchen2017 pushed a commit that referenced this pull request Jun 8, 2025
* refactor num-gpus flag to deployment-type

* update docs

* improve error message

---------

Co-authored-by: Zhewei Yao <zheweiy@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

can i train rlhf with 4 gpus ?

3 participants