Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

H2O Sparkling Water Initialization Action

This initialization action installs H2O Sparkling Water on all nodes of Google Cloud Dataproc cluster. This initialization works with Dataproc image version 1.3 and newer.

Using this initialization action

⚠️ NOTICE: See best practices of using initialization actions in production.

You can use this initialization action to create a new Dataproc cluster with H2O Sparkling Water installed by:

  1. Use the gcloud command to create a new cluster with this initialization action.

    To create Dataproc 1.3 cluster and older use conda initialization action:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --image-version 1.3 \
        --metadata 'H2O_SPARKLING_WATER_VERSION=3.28.0.1-1' \
        --scopes "cloud-platform" \
        --initialization-actions "gs://goog-dataproc-initialization-actions-${REGION}/conda/bootstrap-conda.sh,gs://goog-dataproc-initialization-actions-${REGION}/h2o/h2o.sh"

    To create Dataproc 1.4 cluster and newer use ANACONDA optional component:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --image-version 1.4 \
        --optional-components ANACONDA \
        --metadata 'H2O_SPARKLING_WATER_VERSION=3.28.0.3-1' \
        --scopes "cloud-platform" \
        --initialization-actions "gs://goog-dataproc-initialization-actions-${REGION}/h2o/h2o.sh"
  2. Submit sample job.

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc jobs submit pyspark \
        --cluster ${CLUSTER_NAME} \
        "gs://goog-dataproc-initialization-actions-${REGION}/h2o/sample-script.py"