Skip to content

feat: add support for distributed custom training#66

Merged
sasha-gitg merged 5 commits into
googleapis:devfrom
sasha-gitg:distributed_training
Nov 17, 2020
Merged

feat: add support for distributed custom training#66
sasha-gitg merged 5 commits into
googleapis:devfrom
sasha-gitg:distributed_training

Conversation

@sasha-gitg
Copy link
Copy Markdown
Member

Adds support for chief-worker distributed training to custom training. If replica_count > 1 the remainder as provisioned as workers

The _DistributedTrainingSpec can also support more custom provisioning. We will have a follow up PR after we consider how we want to expose more custom provisioning on the API surface.

Note: If library upgrades to 3.7 we should switch the Spec classes to dataclass.

Fixes b/172369809 🦕

@google-cla google-cla Bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 13, 2020
Copy link
Copy Markdown
Contributor

@vinnysenthil vinnysenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM + tests pass, added a note about the default machine_type

"""

replica_count: int = 0
machine_type: str = "n1-standard-2"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n1-standard-2 is supported in Deploy but not for Training.
n1-standard-4 is currently the lowest common denominator.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's good to know. I was working off the machine_resource.MachineSpec proto comments.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sasha-gitg sasha-gitg merged commit a90f412 into googleapis:dev Nov 17, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants