docker-spark

This repository contains Dockerfile to build spark standalone cluster mode using docker,and has start_spark.sh script to configure and run spark standalone cluster mode.

1. The distributed file system

Dockerfile script using nfs as distributed file system, so you should install nfs first, suppose you have four computers.

192.168.6.168 master
192.168.6.167 slave1
192.168.6.172 slave2
192.168.6.173 slave3

1) install nfs server on master

yum -y install nfs-utils rpcbind

then start nfs server by

systemctl start nfs

configure /etc/exports as below, /home/nfs is nfs server mount destination folder, you can configure it according to you computer. if /home/nfs not exist, you should mkdir it first by mkdir -p /home/nfs

/home/nfs 192.168.6.0/24(rw,no_root_squash,no_all_squash,sync)

you should run exportfs -r to take it effect after you save your configure.

2) install nfs client on slaves

yum -y install nfs-utils

then start nfs, and lookup master's nfs file system as below

showmount -e 192.168.6.168

and run mkdir -p /nfs/mnt to mount master's nfs dictionary

mount -t nfs 192.168.6.168:/home/nfs /nfs/mnt -o proto=tcp -o nolock

You should mount /nfs/mnt nfs dictionary on master and slaves. Now master and slaves have nfs dictionary(/nfs/mnt), this will used by spark.

2.JDK1.8 Spark-2.2.0 and Zookeeper3.4.13

Dockerfile using downloaded file to copy to docker image, so you should download JDK1.8,Spark-2.2.0,Zookeeper3.4.13 in the same directory with Dockerfile. Also, you can download other version spark、zookeeper、jdk, just rewrite Dockerfile.

3.start_spark.sh

This script has configuration as below.

SPARK_ZOOKEEPER_URL=192.168.6.168:2181,192.168.6.167:2181,192.168.6.172:2181 ##Zookeeper address
ZOO_IP_IDS=192.168.6.168:1,192.168.6.167:2,192.168.6.172:3 ## Zookeeper id configure
CURRENT_IP=192.168.6.167  ## current computes's ip address
MASTER_ADDR=192.168.6.168:7077,192.168.6.167:7077,192.168.6.172:7077 ## spark standalone master address, using zookeeper to guarantee HA
SPARK_LOGS_MNT_PATH=/home/docker/spark_logs ## spark logs, persistent stores
SPARK_WORK_HOME=/home/docker/spark_work ## spark work dir, every application will copy jar and other file to work dir, mount host dir to container to avoid no space in docker container error
SPARK_CHECK_INTERVAL=600 ## spark app clean check interval
SPARK_CLEAN_TIMEOUT=3600 ## spark app clean timeout
MASTER_OR_SLAVE=master ## this node is master or slave
NFS_MNT=/nfs/mnt ## nfs file system dir

you can configure time zone to replace TZ=Asia/Shanghai to make sure the time in docker is the same as host.

4.configure 、build and start

On every node, configure start_spark.sh(mainly CURRENT_IP,MASTER_OR_SLAVE, others configure are the same). On master, build images by

docker build -t link2map/spark:1.0 .

then save the image as tar and transfer to other node

docker save -o spark.tar link2map/spark:1.0
scp spark.tar user@slave1:/home/docker/spark

on slaves

docker load < spark.tar

then, on all node run start_spark.sh to start spark standalone cluster

sh start_spark.sh

now spark web-ui will start on MASTER:13300, zookeeper will select a random node as master(in this conf, it could be one of the three 192.168.6.168,192.168.6.167,192.168.6.172), at last you can submit application by spark restful api

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Dockerfile		Dockerfile
README.md		README.md
entrypoint.sh		entrypoint.sh
start_spark.sh		start_spark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docker-spark

1. The distributed file system

1) install nfs server on master

2) install nfs client on slaves

2.JDK1.8 Spark-2.2.0 and Zookeeper3.4.13

3.start_spark.sh

4.configure 、build and start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

docker-spark

1. The distributed file system

1) install nfs server on master

2) install nfs client on slaves

2.JDK1.8 Spark-2.2.0 and Zookeeper3.4.13

3.start_spark.sh

4.configure 、build and start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages