This initialization action
installs
H2O Sparkling Water
on all nodes of Google Cloud Dataproc
cluster. This initialization works with Dataproc image version 1.3 and newer.
You can use this initialization action to create a new Dataproc cluster with H2O Sparkling Water installed by:
-
Use the
gcloudcommand to create a new cluster with this initialization action.To create Dataproc 1.3 cluster and older use
condainitialization action:REGION=<region> CLUSTER_NAME=<cluster_name> gcloud dataproc clusters create ${CLUSTER_NAME} \ --image-version 1.3 \ --metadata 'H2O_SPARKLING_WATER_VERSION=3.28.0.1-1' \ --scopes "cloud-platform" \ --initialization-actions "gs://goog-dataproc-initialization-actions-${REGION}/conda/bootstrap-conda.sh,gs://goog-dataproc-initialization-actions-${REGION}/h2o/h2o.sh"
To create Dataproc 1.4 cluster and newer use
ANACONDAoptional component:REGION=<region> CLUSTER_NAME=<cluster_name> gcloud dataproc clusters create ${CLUSTER_NAME} \ --image-version 1.4 \ --optional-components ANACONDA \ --metadata 'H2O_SPARKLING_WATER_VERSION=3.28.0.3-1' \ --scopes "cloud-platform" \ --initialization-actions "gs://goog-dataproc-initialization-actions-${REGION}/h2o/h2o.sh"
-
Submit sample job.
REGION=<region> CLUSTER_NAME=<cluster_name> gcloud dataproc jobs submit pyspark \ --cluster ${CLUSTER_NAME} \ "gs://goog-dataproc-initialization-actions-${REGION}/h2o/sample-script.py"