Introduction
Docker is a set of software and services that enable virtualization at the level of the operating system; this is also known as containerization.
Docker allows developers to package applications, along with their dependencies and configuration settings, into virtual containers that can run on any Linux, Windows or MacOS machine, be it on a desktop, in the cloud, or on the node of an IoT network.
When a Docker container is run on Linux, Docker leverages the Linux kernel and the overlay file system to ensure the process is isolated. Since no hardware virtualization is involved, this has a very low overhead. On MacOS and Windows, a lightweight virtual machine is provisoned within which Docker is run and containers are executed.
The ability to run containers transparently on any OS gives developers a unifying experience regardless where they develop. Once an application is developed and packaged into a Docker image, the developer can be sure the application will run anywhere where docker runs.
The best way to see how one can use Docker to streamline and unify development and deployment is to use it.
To follow along with examples, you'll need to have Docker installed. You can install all required tools by installing Docker Desktop.
Packaging an application into a Docker image
To test drive Docker, let's implement a simple web application and package it as a docker image–or containerize it.
The web application
First, let's implement the app and run it natively.
We'll create a simple HTTP end-point that takes a query parameter named path
and returns the list of files under location specified by the parameter. (Security wise, having an app that serves the filesystem contents without authorization is generally a bad idea; we're using it here merely to demonstrate the isolation that is offered by Docker.)
We'll implement this in Python using the Falcon web framework and the Gunicorn application server. Here's the code.
import json
import os
import falcon
class FileBrowser:
def on_get(self, req, resp):
if "path" not in req.params:
resp.status = falcon.HTTP_400
resp.text = "Missing path parameter!"
return
path = req.params["path"]
if not os.path.isdir(path):
resp.status = falcon.HTTP_404
resp.text = "Path '%s' does not exist" % path
return
try:
files = json.dumps(os.listdir(path))
resp.status = falcon.HTTP_200
resp.text = files
except Exception as e:
resp.status = falcon.HTTP_500
resp.text = "Unexpected error: '%s'" % e
app = falcon.App()
app.add_route('/', FileBrowser())
This is the entire code which we save into fileapi.py
. The web application accepts GET requests to the root endpoint /
, where the following logic takes place.
- If the query parameter is missing, a
400 Bad Request
response is returned; - If the query parameter is present, but it points to a non-existing path, a
404 Not Found
response is returned; - If the query parameter points to a valid directory path, the list of files is obtained and returned as a JSON array; the default
content-type
in Falcon isapplication/json
. - However, if an error occurs during the file listing, a
500 Internal Server Error
response is returned.
To run this application, we need a system that has Python, the Falcon framework, and the gunicorn web application server installed.
On a typical Debian-based system one would install them with sudo apt install python3 python3-pip
and then use pip to further install python dependencies, for instance, pip3 install falcon gunicorn
. Alternatively, we could also use the venv
command to create a Python virtual environment. Needless to say, this process is different, if one is using MacOS or Windows.
Finally, we can run the application by issuing gunicorn fileapi:app --bind 127.0.0.1:8000
. This will use the gunicorn application server to start the application in file fileapi.py
and listen on the loopback interface; make sure the command is run from the same directory as the said file.
Next, let's test the server with cURL
.
$ curl -i "localhost:8000/"
HTTP/1.1 400 Bad Request
Server: gunicorn
Date: Wed, 21 Sep 2022 13:27:42 GMT
Connection: close
content-length: 23
content-type: application/json
Missing path query parameter!
$ curl -i "localhost:8000/?path=/not-a-dir"
HTTP/1.1 404 Not Found
Server: gunicorn
Date: Wed, 21 Sep 2022 13:28:40 GMT
Connection: close
content-length: 32
content-type: application/json
Path '/not-a-dir' does not exist
$ curl -i "localhost:8000/?path=/usr"
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 21 Sep 2022 13:29:35 GMT
Connection: close
content-length: 78
content-type: application/json
["lib", "include", "libexec", "local", "sbin", "src", "share", "games", "bin"]
$ curl -i "localhost:8000/?path=/root"
HTTP/1.1 500 Internal Server Error
Server: gunicorn
Date: Wed, 21 Sep 2022 13:30:53 GMT
Connection: close
content-length: 57
content-type: application/json
Unexpected error: '[Errno 13] Permission denied: '/root''
Looks like the server is working. Now let's package all of this into a Docker image.
The Docker
file
To create a Docker image, we have to provide a set of instructions that will build it. These instructions are provided with a Dockerfile
.
Create a new file named Dockerfile
in the same directory as the fileapi.py
and populate it with the following.
FROM python:3.10.7-alpine3.16
WORKDIR /app
RUN pip install gunicorn==20.1.0 falcon==3.1.0
COPY . .
EXPOSE 8000
CMD ["gunicorn", "fileapi:app", "--bind", "0.0.0.0:8000"]
These six lines define the entire image. Let's unpack them line-by-line.
The command
FROM
sets the base image to use. While we could start with an empty image, we are going to leverage one of many pre-configured Python images from the DockerHub.In our case, we are picking Python version 3.10.7 and the supporting libraries that are part of the Alpine Linux distribution.
(This does not mean, that we'll be running the Alpine Linux in a virtual machine, only that the libraries packaged in the image will come from the said Linux distribution.)
We are selecting Alpine Linux because of its small disk footprint.
The
WORKDIR
command sets the working directory inside the image. If the directory does not exist, it will be created; this will be the location of our application.We install required Python dependencies with the
RUN
command.Here we are pinning the libraries to specific versions. This is good practice, since we know that our application works fine, if Python is 3.10.7, gunicorn is 20.1.0 and the falcon is 3.1.0.
(If we had many such dependencies, it would be better to use the
requirements.txt
file, but let's keep things simple for now.)Next we use
COPY . .
to copy all resources from the current directory on the host computer to the working directory (/app
) in the image.As it currently stands, the command will copy all files from the host which is often undesirable; we show how to list exclusions a bit later.
The
EXPOSE 8000
command will allow services running on port8000
in the container to be accessible to other processes inside the container.While in our case no other container processes will access this service (there will only be a single process running in container), the command is still needed, because processes from outside of the container will access the service. But we will have to provide additional commands to allow this.
And finally, the
CMD
command specifies the command that runs when the container is started.In our case the command is
gunicorn fileapi:app --bind 0.0.0.0:8000
; we changed the IP from loopback device to all interfaces. The reason is that the container will have to listen on all interfaces if we want to access it from the host computer.If we used the container's loopback device, we would be unable to reach it from the host computer, since the loopback device inside the container is different than loopback device on the host computer.
To exclude certain files from being copied from the host into the image
(command COPY . .
in step 4 above), create a file called
.dockerignore
and populate it with the following.
.*
__pycache__/
These two lines instruct the Docker COPY
command to ignore all hidden
files (files starting with dot .
), and the __pycache__
directory.
Building the Docker image
Now we are ready to build the image. Inside the directory that contains
the Dockerfile
, issue the following command.
$ docker build -t file-api .
...
Successfully tagged file-api:latest
The command builds the image and tags it file-api
. During the build,
all required dependencies are also installed. We can get the list of
images that are available on our system as follows.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
file-api latest fbbf93095fb6 3 minutes ago 63.2MB
python 3.10.7-alpine3.16 4da4c1dc8c72 13 days ago 48.7MB
Running the container
Now that the image has been built, we can run it and create a container.
$ docker run -p 127.0.0.1:5000:8000 file-api
[2022-09-21 15:07:50 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2022-09-21 15:07:50 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
[2022-09-21 15:07:50 +0000] [1] [INFO] Using worker: sync
[2022-09-21 15:07:50 +0000] [6] [INFO] Booting worker with pid: 6
We have now run the image file-api
and started the container. Docker
is mapping address 127.0.0.1:5000
on the host to 0.0.0.0:8000
in the
container; this was achieved with the -p 127.0.0.1:5000:8000
switch.
If we open a new terminal—the container is running in the current one—and issue a few GET requests, we should get familiar responses.
$ curl -i "localhost:5000/?path=/usr"
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 21 Sep 2022 15:12:34 GMT
Connection: close
content-length: 47
content-type: application/json
["lib", "local", "sbin", "share", "bin", "src"]
$ curl -i "localhost:5000/?path=/root"
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 21 Sep 2022 15:12:54 GMT
Connection: close
content-length: 29
content-type: application/json
[".cache", ".python_history"]
However, notice how the contents of the /usr
and /root
are now
different. This is because the app is now running inside the container
which has its own filesystem and directory structure and is isolated
from the host computer.
Moreover, applications inside the container are run as root
by
default; this is why accessing /root
is now allowed. However, in
certain situation, there might be good security reasons to avoid
this.
But this is a material for another topic.
We can now query the Docker to see which containers are running.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ee841890bee4 file-api "gunicorn fileapi:ap…" 13 minutes ago Up 13 minutes 127.0.0.1:5000->8000/tcp cool_johnson
Since we did not name the container explicitly, Docker came up with a
random name cool_johnson
. If we press CTRL+C
in the terminal that is
running the container, the container should stop. Now we have to run
docker ps -a
to see the list of all containers, stopped and running.
To delete the container, run docker rm cool_johnson
. You might have to
change the name, since it is unlikely that yours is also called
cool_johnson
.
Managing more complex setups with docker compose
Web applications often consist of multiple services: an application server, a database, a cache layer, a background task system and so on. To make our example more realistic, let's add another service to the web application.
Suppose we bench-marked our system and found out that the operation that lists the contents of a directory is rather slow. Since our filesystem rarely changes, if ever, we decide to implement a simple cache mechanism using Redis database.
Redis is an in-memory fast key-value store. As keys, we'll store the
query path
values, and as the corresponding values, we'll store the
list of files under given paths.
Next, we'll change the application so that when a request to a valid
path is received, it will first consult redis if it contains the key
under given path
, and if so, it will serve the cached contents.
If the key does not exist, the application will list the files from the
filesystem, serve them to the client, and save the result to the cache.
Consequentely, all subsequent requests to the same path
should then be
fetched from the cache and not from the slow filesystem.
This modification will introduce a new service to our application set-up and add complexity: we have to modify the application to use the Redis database, we have to create another container that will run it, and we have to connect both containers.
Application modification
The modifications to the web application are rather straightforward;
below we list the entire Falcon application that uses Redis for caching.
Let's modify fileapi.py
to contain the following code.
import json
import os
import falcon
import redis
class FileBrowser:
def __init__(self, cache):
self.cache = cache
def on_get(self, req, resp):
if "path" not in req.params:
resp.status = falcon.HTTP_400
resp.text = "Missing path query parameter!"
return
path = req.params["path"]
cached = self.cache.get(path)
if cached:
resp.status = falcon.HTTP_200
resp.text = cached
return
if not os.path.isdir(path):
resp.status = falcon.HTTP_404
resp.text = "Path '%s' does not exist" % path
return
try:
files = json.dumps(os.listdir(path))
resp.status = falcon.HTTP_200
resp.text = files
self.cache.set(path, files)
except Exception as e:
resp.status = falcon.HTTP_500
resp.text = "Unexpected error: '%s'" % e
app = falcon.App()
redis_cache = redis.Redis(host='redis-cache', port=6379, db=0, decode_responses=True)
app.add_route('/', FileBrowser(redis_cache))
Notice how we set the address of the Redis database to redis-cache
;
this is an actual hostname that will be assigned to the container that
will run the Redis database.
Because our modifications also add a new Python dependency, namely
Python libraries that connect to Redis, we have to update the
Dockerfile
.
FROM python:3.10.7-alpine3.16
WORKDIR /app
RUN pip install gunicorn==20.1.0 falcon==3.1.0 redis==4.3.4
COPY . .
EXPOSE 8000
CMD ["gunicorn", "fileapi:app", "--bind", "0.0.0.0:8000"]
The only change is in the RUN
command that now additionally installs
Python redis bindings.
Setting up additional docker containers
Next, we have to spin-up another container that will run the Redis database, and link it with the web application container.
While we could do all these things manually with multiple but separate
commands, we can package everything into a docker-compose.yml
that
specifies all required services, their dependencies, configuration, and
start-up sequence. And then we can start our application with a single
command.
Here is the docker-compose.yml
that we'll need.
version: "3"
services:
redis-cache:
image: redis:7.0.4-alpine3.16
restart: always
expose:
- 6379
falcon-webapp:
restart: always
build: .
image: file-api
ports:
- 127.0.0.1:5000:8000
depends_on:
- redis-cache
Let's parse the contents line-by-line.
First we have to specify the schema version; it needs to be provided as a string.
Next, we specify the list of services, or containers, that will run in this setup; this is defined with the
service
keyword.We are naming the first service
redis-cache
. The container will be assigned an interal IP andredis-cache
will be its hostname; recall the Python code.As with Python image, we browse the Dockerhub for Redis images and pin it to a specific version (7.0.4) and environment (Alpine Linux 3.16). We expose port
6379
which Redis uses by default. If the container unexpectedly stops, Docker will attempt to restart it.Finally, we define the web application service and name it
falcon-webapp
.This service container gets created from the image defined in the
Dockerfile
defined above. It needs to be in the same directory as thedocker-compose.yml
, hencebuild: .
.Next, we set the name of the image to be built to
file-api
, we set the port forwarding to allow the host computer to access the container onlocalhost:5000
, and we require theredis-cache
service to be online before thefalcon-webapp
.
Running and inspecting services
Once the docker-compose.yml
is ready, we build required images; in our
case, only image file-api
, image for Redis will get downloaded when we
start the application.
$ docker compose build
[+] Building 3.2s (9/9) FINISHED
=> [internal] load build definition from Dockerfile 0.8s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 1.2s
=> => transferring context: 34B 0.0s
=> [internal] load metadata for docker.io/library/python:3.10.7-alpine3.16 0.0s
=> [1/4] FROM docker.io/library/python:3.10.7-alpine3.16 0.0s
=> [internal] load build context 0.6s
=> => transferring context: 100B 0.0s
=> CACHED [2/4] WORKDIR /app 0.0s
=> CACHED [3/4] RUN pip install gunicorn==20.1.0 falcon==3.1.0 redis==4.3.4 0.0s
=> CACHED [4/4] COPY . . 0.0s
=> exporting to image 0.6s
=> => exporting layers 0.0s
=> => writing image sha256:eb8a14d65dcfe9c29ce1ae5020a3f15ea01ac307941aaba5c45101c11cf47bc7 0.0s
=> => naming to docker.io/library/file-api 0.0s
To run the application, we issue docker compose up -d
. The command up -d
means start containers in detached mode–in the background. Once the
application is running, we can issue requests as before.
$ curl -i "localhost:5000/?path=/root"
HTTP/1.1 200 OK
Server: gunicorn
Date: Thu, 22 Sep 2022 11:51:26 GMT
Connection: close
content-length: 29
content-type: application/json
[".cache", ".python_history"]
$ curl -i "localhost:5000/?path=/home"
HTTP/1.1 200 OK
Server: gunicorn
Date: Thu, 22 Sep 2022 11:51:36 GMT
Connection: close
content-length: 2
content-type: application/json
[]
The following commands are often useful:
docker compose down
– stops all containers,docker compose logs
– shows logs from all containers,docker compose exec <service> <command>
– executes thecommand
in the container that is running the givenservice
.
To demonstrate how we can attach to a container and run commands in it, let's examine the contents of the Redis database. First, we attach to the redis container as follows.
$ docker compose exec redis-cache sh
/data # ps ax
PID USER TIME COMMAND
1 redis 0:00 redis-server *:6379
22 root 0:00 sh
28 root 0:00 ps ax
/data #
Running docker compose exec redis-cache sh
will execute the sh
(or
shell) binary within the container running the redis-cache
service
which effectively gives us shell access.
If we execute ps ax
, we see that besides the processes that we are
running—namely sh
and ps ax
—the only other process is redis-server
listening on port 6379
on all interfaces. Let's exit by pressing
CTRL+D
.
To inspect the contents of Redis, issue the following command on the host computer.
$ docker compose exec redis-cache redis-cli
127.0.0.1:6379> keys *
1) "/home"
2) "/root"
127.0.0.1:6379> get /root
"[\".cache\", \".python_history\"]"
127.0.0.1:6379> get /home
"[]"
127.0.0.1:6379>
Now we directly run redis-cli
to jump Redis command-line prompt inside
the container. Then we list all keys using keys *
and then inspect the
contents under keys /root
and /home
. We close the prompt with
CTRL+D
.
Conclusion
While this was everything but short, it only scraped the surface of what Docker is. For further reference, consider visiting the Docker documentation.