This initialization action installs a binary release of
Docker on a Google Cloud
Dataproc cluster. After installation, it
will add the yarn user to the special docker group so that YARN-executed
applications can access Docker.
-
Use the
gcloudcommand to create a new cluster with this initialization action.REGION=<region> CLUSTER_NAME=<cluster_name> gcloud dataproc clusters create ${CLUSTER_NAME} \ --region ${REGION} \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/docker/docker.sh
-
Docker is installed and configured on all nodes of the cluster (both master and workers). You can log into the master node and run a test command to see that it works:
sudo docker run hello-world
Or, to run as the
yarnuser would:sudo su yarn docker run hello-world