Quick Run Spark with Docker in Mac OS

For the time being, I am just listing a brief guide.

1. install boot2docker, which is a linux core required to run docker.

2. install spark docker by https://github.com/sequenceiq/docker-spark (just pull docker image). The docker image has a hadoop yarn inside as well.

3. in MacOs, start boot2docker first, note down and export the $DOCKER_HOST environment. In MacOS, any docker client command requires this environment to connect to boot2docker’s host.

4. after export the docker environment in MacOS, investigate docker containers by `docker ps` with -a or -l(astest), or `docker inspect CONTAINER_HASH

5. now in Mac OS, it is safe to run

docker run  -p 4040:4040 -p 8030:8030 -p 49707:49707 -p 50020:50020 -p 8042:8042 -p 50070:50070 -p 8033:8033 -p 8032:8032 -p 50075:50075 -p 22:22 -p 8031:8031 -p 8040:8040 -p 50010:50010 -p 50090:50090 -p 8088:8088 -i -t -h sand    box sequenceiq/spark:1.2.0 /etc/bootstrap.sh -bash

I open the ports so that MacOS can access directly.

6. you can exam the ports opened in boot2docker by `sudo iptables -t nat -L -n` from boot2docker. Enter boot2docker by `boot2docker ssh`, and make hostname `sandbox` to be reflected in Mac OS by adding it in /etc/hosts

Despite the learning curve in docker, I have to say so far this way is the most convenient for deploying/testing a spark application and less computational power consuming.

Downside: It is not a cluster, but one node only.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: