[SPMD] Mesh to support custom device order. by yeounoh · Pull Request #4162 · pytorch/xla

yeounoh · 2022-11-07T00:52:34Z

This implements Mesh class from #3871 , to support custom device order in logical XLA device mesh topology.

JackCaoG · 2022-11-07T19:20:21Z

-), "PyTorch/XLA SPMD requires PJRT_DEVICE={CPU, TPU}, GPU is currently not supported."
-                )
+@unittest.skipIf(not using_pjrt() or xm.get_xla_supported_devices("GPU"),
+                 f"Requires PJRT_DEVICE set to `TPU` or `CPU`.")


@will-cromar I think PJRT-GPU single core is ready now?

It's blocked from the our SPMD side, once we support TPU, the transition should be easier to GPU -- maybe sometime next year once we are done with the basic/core SPMD features?

JackCaoG · 2022-11-07T19:24:15Z

+
+  Args:
+    device_ids (Union[np.ndarray, List]): A raveled list of devices (IDs) in a custom order. The list is reshaped
+        to an `mesh_shape` array, filling the elements using C-like index order. For example,


where is the example lol?

oh ok it is below, you might want to change the wording here.

JackCaoG

Thanks!

jonb377

LGTM, thanks!

jonb377 · 2022-11-07T19:32:48Z

        mesh_shape (Tuple[Union[int, None]]): A int tuple describing the logical topology
        of the device mesh, and each element describes the number of devices in
        the corresponding axis.


Looks like mesh_shape can be removed here

Good catch :)

jonb377 · 2022-11-07T22:25:27Z


+  def test_custom_tile_assignment(self):
+    xt = torch.randn(10, 20).to(device=xm.xla_device())
+    mesh_shape = (1, self.n_devices)


I see the tests have all devices mapped to a single axis - is there anything stopping us from using e.g. mesh_shape = (2, self.n_devices / 2)?

No, but for the unit testing a flat mesh is easier to work with since we don't know how many devices we would have (e.g., for CPU, we will have 1).

jonb377 · 2022-11-07T22:31:09Z

+  def __init__(self,
+               device_ids: Union[np.ndarray, List],
+               mesh_shape: Tuple[int, ...],
+               axis_names: Tuple[str, ...] = None):


Just curious - how will axis_names be used long-term? Is it just for annotating the mesh?

Good question, mesh axis annotation is useful since it makes the annotation logic more readable. We can also build a partitioning rule based on the axis name, instead of int indices.

yeounoh · 2022-11-08T17:30:00Z

narrow_copy_dense has been renamed to narrow_copy_dense_symint in the upstream, rebasing to fix the build issue.

steventk-g

lgtm!

yeounoh added the distributed SPMD and other distributed things. label Nov 7, 2022

yeounoh requested review from JackCaoG, jonb377, ronghanghu and steventk-g November 7, 2022 00:52

yeounoh self-assigned this Nov 7, 2022

JackCaoG reviewed Nov 7, 2022

View reviewed changes

Comment thread torch_xla/csrc/tensor_util.cpp

JackCaoG reviewed Nov 7, 2022

View reviewed changes

yeounoh force-pushed the spmd_device_mesh branch from 970536f to 716865c Compare November 7, 2022 20:19

JackCaoG approved these changes Nov 7, 2022

View reviewed changes

jonb377 approved these changes Nov 7, 2022

View reviewed changes

yeounoh force-pushed the spmd_device_mesh branch from 716865c to 15a30e4 Compare November 7, 2022 23:36

yeounoh force-pushed the spmd_device_mesh branch 3 times, most recently from 3d6bc93 to eea8e9c Compare November 8, 2022 19:33

[SPMD] Mesh to support custom device order.

234871f

yeounoh force-pushed the spmd_device_mesh branch from eea8e9c to 234871f Compare November 8, 2022 20:04

steventk-g approved these changes Nov 8, 2022

View reviewed changes

yeounoh merged commit b096c5c into master Nov 8, 2022

Conversation

yeounoh commented Nov 7, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackCaoG left a comment

Choose a reason for hiding this comment

Uh oh!

jonb377 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeounoh commented Nov 8, 2022

Uh oh!

steventk-g left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants