"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Supported machine types

Managed Service for Apache Spark clusters are built on Compute Engine instances. Machine types define the virtualized hardware resources available to an instance. Compute Engine offers both predefined machine types and custom machine types. Managed Service for Apache Spark clusters can use both predefined and custom types for both master and worker nodes.

Managed Service for Apache Spark clusters support the following Compute Engine predefined machine types (machine type availability varies by region):

General purpose machine types, which include N1, N2, N2D, E2, C3, C4, N4 and N4D machine types (Managed Service for Apache Spark also supports N1, N2, N2D, E2, N4 and N4D custom machine types).
Limitations:
- The n1-standard-1 machine type is not supported for 2.0+ images (the n1-standard-1 machine type is not recommended for pre-2.0 images—instead, use a machine type with higher memory).
- Shared-core machine types are not supported, which include the following unsupported machine types:
  - E2: e2-micro, e2-small, and e2-medium shared-core machine types, and
  - N1: f1-micro and g1-small shared-core machine types.
- Managed Service for Apache Spark selects hyperdisk-balanced as the boot-disk type if machine type is C4, N4 or N4D.
Compute-optimized machine types, which include C2 and C2D machine types.
Memory-optimized machine types, which include M1 and M2 machine types.
Arm machine types, which include C4A machine types.

Custom machine types

Managed Service for Apache Spark supports N1, N2, N2D, E2, N4 and N4D series custom machine types.

Custom machine types are ideal for the following workloads:

Workloads that are not a good fit for the predefined machine types.
Workloads that require more processing power or more memory, but don't need all of the upgrades that are provided by the next machine type level.

For example, if you have a workload that needs more processing power than that provided by an n1-standard-4 instance, but the next step up, an n1-standard-8 instance, provides too much capacity. With custom machine types, you can create Managed Service for Apache Spark clusters with master and/or worker nodes in the middle range, with 6 virtual CPUs and 25 GB of memory.

Specify a custom machine type

Custom machine types use a special machine type specification and are subject to limitations. For example, the custom machine type specification for a custom VM with 6 virtual CPUs and 22.5 GB of memory is custom-6-23040.

The numbers in the machine type specification correspond to the number of virtual CPUs (vCPUs)in the machine (6) and the amount of memory (23040). The amount of memory is calculated by multiplying the amount of memory in gigabytes by 1024 (see Expressing memory in GB or MB). In this example, 22.5 (GB) is multiplied by 1024: 22.5 * 1024 = 23040.

You specify the custom machine type when you create a cluster. You can set the machine type for either master or worker nodes or both when you create a cluster. If you set both, the master node can use a custom machine type that is different from the custom machine type used by workers. The machine type used by any secondary workers follow the settings for primary workers and cannot be separately set (see Secondary workers - preemptible and non-preemptible VMs).

Custom machine type pricing

Custom machine type pricing is based on the resources used in a custom machine. Managed Service for Apache Spark pricing is added to the cost of compute resources, and is based on the total number of virtual CPUs (vCPUs) used in a cluster.

Create a Managed Service for Apache Spark cluster with a specified machine type

Console

From the Configure nodes panel of the Managed Service for Apache Spark Create a cluster page in the Google Cloud console, select machine family, series and type for the cluster's master and worker nodes.

gcloud command

Run the gcloud dataproc clusters create command with the following flags to create a Managed Service for Apache Spark cluster with master and/or worker machine types:

The --master-machine-type machine-type flag lets you set the predefined or custom machine type used by the master VM instance in your cluster (or master instances if you create a HA cluster)
The --worker-machine-type custom-machine-type flag lets you set the predefined or custom machine type used by the worker VM instances in your cluster

Example:

gcloud dataproc clusters create test-cluster /
    --master-machine-type custom-6-23040 /
    --worker-machine-type custom-6-23040 /
    other args

An easy way to examine and construct a gcloud cluster create command is to open the Managed Service for Apache Spark Create a cluster page in the Google Cloud console, fill in the applicable fields on the page, then click Equivalent command line at the bottom of the left pane of the Create a cluster page to view, copy, and paste the completed gcloud command.

Once the Managed Service for Apache Spark cluster starts, cluster details are displayed in the terminal window. The following is a partial sample listing of cluster properties displayed in the terminal window:

...
properties:
  distcp:mapreduce.map.java.opts: -Xmx1638m
  distcp:mapreduce.map.memory.mb: '2048'
  distcp:mapreduce.reduce.java.opts: -Xmx4915m
  distcp:mapreduce.reduce.memory.mb: '6144'
  mapred:mapreduce.map.cpu.vcores: '1'
  mapred:mapreduce.map.java.opts: -Xmx1638m
...

API

To create a cluster with custom machine types, set the machineTypeUri in the masterConfig and/or workerConfig InstanceGroupConfig in the cluster.create API request.

Example:

POST /v1/projects/my-project-id/regions/is-central1/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "test-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "default",
      "zoneUri": "us-central1-a"
    },
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "n1-highmem-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "n1-highmem-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    }
  }
}

Create a Managed Service for Apache Spark cluster with custom machine type with extended memory

Managed Service for Apache Spark supports custom machine types with extended memory beyond the 6.5GB per vCPU limit (see Extended Memory Pricing).

Console

Click Extend memory when customizing Machine type memory in the Master node and/or Worker nodes section from the Configure nodes panel on the Managed Service for Apache Spark Create a cluster page in the Google Cloud console.

gcloud Command

To create a cluster from the gcloud command line with custom CPUs with extended memory, add a -ext suffix to the ‑‑master-machine-type and/or ‑‑worker-machine-type flags.

Example

The following gcloud command-line sample creates a Managed Service for Apache Spark cluster with 1 CPU and 50 GB memory (50 * 1024 = 51200) in each node:

gcloud dataproc clusters create test-cluster /
    --master-machine-type custom-1-51200-ext /
    --worker-machine-type custom-1-51200-ext /
    other args

API

The following sample JSON snippet from a Managed Service for Apache Spark REST API clusters.create request specifies 1 CPU and 50 GB memory (50 * 1024 = 51200) in each node:

...
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "custom-1-51200-ext",
    ...
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "custom-1-51200-ext",
     ...
...

Arm machine types

Managed Service for Apache Spark supports creating a cluster with nodes that use Arm machine types, such as the C4A machine type.

Requirements and limitations:

The Managed Service for Apache Spark image must be compatible with Arm chipset. The Managed Service for Apache Spark 2.1-ubuntu20-arm, 2.2-ubuntu22-arm, and 2.3-ubuntu22-arm (and later -arm suffix) images are compatible with the Arm chipset. Arm-compatible images don't support many optional and initialization-action components as noted on the image release version pages.
Since one image must be specified for a cluster, the master, worker, and secondary-worker nodes must use an Arm machine type that is compatible with the selected Managed Service for Apache Spark Arm image.
Managed Service for Apache Spark features that are not compatible with Arm machine types aren't available (for example, local SSDs aren't supported by C4A machine types).
Arm images only support pre-installed components and a limited set of optional components. Other optional components and all initialization actions are unsupported.

Create a Managed Service for Apache Spark cluster with an Arm machine type

Console

To create a Managed Service for Apache Spark cluster that uses an Arm machine type, do the following:

In the Google Cloud console, go to the Managed Service for Apache Spark Create a Dataproc cluster on Compute Engine page.

Go to Create a Dataproc cluster on Compute Engine
In the Versioning section, click Change to select an Arm chipset image.
Select the Configure nodes panel.
Select the Arm series (such as C4A) and the Arm machine type for each cluster node.
Confirm or specify other cluster details, then click Create.

gcloud

To create a Managed Service for Apache Spark cluster that uses an Arm machine type, run the following gcloud command locally in a terminal window or in Cloud Shell. This example specifies the 2.1-ubuntu20-arm image and the c4a-standard-4 Arm machine type.

gcloud dataproc clusters create cluster-name \
    --region=REGION \
    --image-version=2.1-ubuntu20-arm \
    --master-machine-type=c4a-standard-4 \
    --worker-machine-type=c4a-standard-4

Notes:

REGION: The region where the cluster will be located.
See the gcloud dataproc clusters create reference documentation for information on additional command-line flags you can use to customize your cluster.

API

The following sample Managed Service for Apache Spark REST API clusters.create request creates cluster that uses the c4a-standard-4 Arm machine type.

POST /v1/projects/my-project-id/regions/is-central1/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "sample-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "default",
      "zoneUri": "us-central1-a"
    },
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "c4a-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
      }
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "c4a-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    },
    "softwareConfig": {
      "imageVersion": "2.1-ubuntu20-arm"
    }
  }
}

What's next

Learn more about Arm VMs on Compute.

Learn how to create a VM with a custom machine type.

Learn how to create and start a Compute Engine instance.

Supported machine types Stay organized with collections Save and categorize content based on your preferences.

Custom machine types

Specify a custom machine type

Custom machine type pricing

Create a Managed Service for Apache Spark cluster with a specified machine type

Console

gcloud command

API

Create a Managed Service for Apache Spark cluster with custom machine type with extended memory

Console

gcloud Command

API

Arm machine types

Create a Managed Service for Apache Spark cluster with an Arm machine type

Console

gcloud

API

What's next

Supported machine types