Introduction to Docker-Compose

Docker-Compose is used to manage your containers, kind of like a container steward. We write a file, declare the container to be started in this file, configure some parameters, execute this file, Docker will start all containers according to the declared configuration.

Here is an example to build a multi-container Apache Spark cluster using docker-compose.

(1) Create a file name “Dockerfile ” and add the following contents in the file

FROM ubuntu:20.04
RUN apt-get update
RUN apt-get -y upgrade
RUN apt install -y openjdk-8-jre-headless
RUN apt install -y scala
RUN apt install -y wget
RUN apt install -y screen
RUN wget https://archive.apache.org/dist/spark/ spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz
RUN tar xvf spark-3.2.0-bin-hadoop3.2.tgz
RUN mv spark-3.2.0-bin-hadoop3.2/ /usr/local/spark ENV PATH="${PATH}:$SPARK_HOME/bin"
ENV SPARK_HOME="/usr/local/spark"
ENV SPARK_NO_DAEMONIZE="true"
RUN sleep 15
CMD screen -d -m $SPARK_HOME/sbin/start-master.sh ; $SPARK_HOME/sbin/start-worker.sh spark://sparkmaster:7077

(2) Build the image based on the Dockerfile and run the container

1 2	`docker build -t sparkaio/first:v0 . docker run -h sparkmaster <Generated-Image-ID or image name>`

The following commands are used to confirm that the spark setup is working correctly

1
2

docker exec -it container_id /bin/bash
$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://sparkmaster:7077 $SPARK_HOME/examples/jars/spark-examples_2.12-3.2.0.jar

Based on the above configurations, a single container-based Spark framework has been created. Next step is to prepare a configuration file, compatible with docker-compose, and run a Spark cluster with at least one master node and one additional worker.

(3) Create a docker-compose.yml file and write the configuration on this file

Here, we set 1 master and 1 worker for the Spark cluster.

version: "2"
services:
  master:
    image: sparkaio/first:v0
    command: /usr/local/spark/sbin/start-master.sh
    hostname: master
    ports:
      - "6066:6066"
      - "7070:7070"
      - "8080:8080"
      - "50070:50070"
  worker:
    image: sparkaio/first:v0
    command: /usr/local/spark/sbin/start-worker.sh
    links:
      - master

(4) Startup and shutdown of docker-compose

# We can start the container by using up
docker-compose up
# After the first time, we can simply use start to start the services
docker-compose start
# To safely stop the active services, we can use stop
docker-compose stop
# To reset the status of the project, we simply run down
docker-compose down

Data Engineering

Docker

All articles in this blog adopt the CC BY-SA 4.0 agreement except for special statements. Please indicate the source for reprinting!

LU Decomposition Previous

Basic Use of Docker Next