For the time being, I am just listing a brief guide.
1. install boot2docker, which is a linux core required to run docker.
2. install spark docker by https://github.com/sequenceiq/docker-spark (just pull docker image). The docker image has a hadoop yarn inside as well.
3. in MacOs, start boot2docker first, note down and export the $DOCKER_HOST environment. In MacOS, any docker client command requires this environment to connect to boot2docker’s host.
4. after export the docker environment in MacOS, investigate docker containers by `docker ps` with -a or -l(astest), or `docker inspect CONTAINER_HASH
5. now in Mac OS, it is safe to run
docker run -p 4040:4040 -p 8030:8030 -p 49707:49707 -p 50020:50020 -p 8042:8042 -p 50070:50070 -p 8033:8033 -p 8032:8032 -p 50075:50075 -p 22:22 -p 8031:8031 -p 8040:8040 -p 50010:50010 -p 50090:50090 -p 8088:8088 -i -t -h sand box sequenceiq/spark:1.2.0 /etc/bootstrap.sh -bash
I open the ports so that MacOS can access directly.
6. you can exam the ports opened in boot2docker by `sudo iptables -t nat -L -n` from boot2docker. Enter boot2docker by `boot2docker ssh`, and make hostname `sandbox` to be reflected in Mac OS by adding it in /etc/hosts
Despite the learning curve in docker, I have to say so far this way is the most convenient for deploying/testing a spark application and less computational power consuming.
Downside: It is not a cluster, but one node only.