How to Work with Containers

Moses Mbadi
11 min readApr 5, 2023
My girl Delores

Introduction

First things first, this is not a ‘do this and don’t do that’ type of article. This is rather a general discussion on if you use this will happen, if you have this check this and this out. The final decision on what to use or not to use entirely lies with you as projects are vastly different. We will have a look at some of the core concepts that you missed in that docker container basics tutorial 😉.

We’ll be looking at two or three base images in this article. However, the skills and techniques you’ll learn here will enable you to propagate the same to whichever image you choose and help you make a better more informed decision on your tooling. If you’re new to containers and don’t understand what container images and base images are, please read my series of articles on introduction to DevOps starting here.

In summary, containers package your application as, well, containers. Think of a container as a tiny, lightweight virtual machine whose task is to only run a specific application/service. Makes sense? Let’s look at some of the confusing terms;

Base Image: The bare OS that your app will run on, ubuntu, alpine… always the first instruction in your container definition file

Container Image: Portable Base image plus dependencies and application source code to run anywhere with a container runtime (i.e docker)

Container: Container image after execution, a running application.

Container definition: The .yml file where you put all the instructions

The base image to an extent determines the performance of your application, taking size and memory into consideration. Some images are big i.e Ubuntu, Python and others are small ie Alpine.

First things first, let’s start by creating a container using different base images. This is a simple Django application. My program file structure is as shown below. The only thing we’ll be touching here is the Dockerfile;

Project file structure

Below, is the first container definition file.

With Alpine base image

# define base image
FROM python:3.8.3-alpine

# set work directory
WORKDIR /usr/src/app

# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# install dependencies
# RUN pip install --upgrade pip
COPY ./requirements.txt /usr/src/app
RUN pip install -r requirements.txt

# copy project
COPY . /usr/src/app

EXPOSE 8000

CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]

If you understand the basics of docker, the steps are, define the container definition inside a .yml file, build the image then you run the image. In our example, we will only focus on defining the container image and building it. The command to build a docker image using our example is:

docker build -t image-name .

As you can see, the container took about 36 seconds to build and was about 123.66 MB in size.

Below is the same image built with the Ubuntu base image, with of course a few adjustments in the Dockerfile.

With Python Base Image

The first image was not built with a ‘pure’ Python image. It was an Alpine Linux image optimized for Python development, hence the python:alpine name. Below we will build again the same container image but with a Python base image. Below is the container definition .yml file.

# # # pull the official base image
FROM python

# set work directory
WORKDIR /usr/src/app

# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# install dependencies
# RUN pip install --upgrade pip
COPY ./requirements.txt /usr/src/app
RUN pip install -r requirements.txt

# copy project
COPY . /usr/src/app

EXPOSE 8000

CMD ["python3", "manage.py", "runserver", "0.0.0.0:8000"]

As you might have already noticed the image is way bigger than the alpine-based image and took a little bit more time, not a little bit, it certainly took its time 😂.

With Ubuntu base image

Not a lot of people use Ubuntu as their base image. While I would have created one with no references, I needed to see a few samples and throw in a few tips and tricks. The only reference I came across for dockerizing a Django app with Ubuntu was here. Thanks to the good folks at Educative!

FROM ubuntu:20.04

WORKDIR /usr/src/app

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

COPY ./requirements.txt /usr/src/app

RUN apt-get update && apt-get install python3 -y && apt install python3-pip -y && pip3 install --upgrade pip

RUN alias python=python3

RUN pip install -r requirements.txt

COPY . /usr/src/app

EXPOSE 8000

CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]

For Ubuntu as the base image, the container image took 138.3 seconds to build and was about 493 MB. That’s big! Compared to the previous example. One thing you might have noticed is that we had to install Python in the Ubuntu image. On the Alpine base image, python was already installed natively. This is why we have so many base images since some are custom-made for specific applications which require specific native tools. If you were running a Nodejs application, you would pick a base image that natively comes installed with Nodejs i.e node.

An interesting concept to note is why we add instructions line by line. Every command inside the docker file is a layer. To understand this concept we will install Python in one line and separate lines.

While the commands are chained into one, this is the size,

RUN apt-get update && apt-get install python3 -y && apt install python3-pip -y && pip3 install --upgrade pip

Now, the same instructions are separated into single lines like so

RUN apt-get update 
RUN apt-get install python3 -y
RUN apt install python3-pip -y && pip3 install --upgrade pip

Will build an image this big

The image is bigger. For a simple image like this of course the difference is meager, but imagine doing the same on bigger images.

WHY CHOOSE ONE IMAGE OVER THE OTHER

There are trade-offs involved when working with containers. DevOps at the core is more of a research role. Given how individual services are, it takes time to find out the best platform to host/run the services reliably. You might decide to go with Alpine-based images in development due to their size and security but as noted in a separate discussion it has some dependency faults i.e alpine uses musl libc instead of glibc, and some Python modules rely on glibc, but this usually isn’t a big problem. A bigger issue is, that because of this, manylinux wheels are not avaliable for Alpine, and therefore the modules need to be compiled upon installation (pip install). In some cases, this can make a difference in build time between 20 seconds on Debian and 9 minutes or more on Alpine. The grpcio-module is notorious for that; it takes forever to compile. I mentioned that Alpine is secure and you might wonder why. Simply put, alpine has fewer files (system packages for things like database clients, image file manipulation and XML parsing libraries), some refer to this as a smaller footprint, hence a smaller attack surface.

All docker images default to Debian. When you do docker run mysql, it runs on Debian base image, but most of these images have an Alpine option.

WHAT ARE LAYERS

You might have read somewher ethat docker images re built as layers stacked up on top of other layers. Say we start with an Ubuntu base image and install Python, then we install Django, those are two commands hence two layers, 3 with the base image. The first layer is our base image, the second layer of course has Ubuntu as the base image and Python. When we install Django, that’s another layer.

FROM ubuntu:20.04 <---- creates the first layer
RUN apt install python3-pip <--- craetes the second layer
RUN pip install django <---- creates the third layer
RUN pip install other-software <---- creates the forth layer

Layer1: Ubuntu Base Image (200Mbs)

Layer 2: Ubuntu Base image + Python (200Mbs+10Mbs)

Layer 3: Ubuntu Base Image + Python + Django (200Mbs+10Mbs + 11Mbs)

Layer 4: Ubuntu Base Image + Python + Django + any other tool (200Mbs +10Mbs + 11Mbs + 3Mbs)

Will the final image be 224 Mbs or 855 Mbs? Use the reference image below to get the answer and comment below.

Caching

A container image is basically a snapshot of your application at the point of creating of the image. By snapshot, I mean the base image, source code and dependencies needed to run that application all packaged into one immutable unit. Future updates cannot be injected into the image nor into the running containers directly as containers themselves are designed to be stateless and immutable. Therefore, to update a container image, you have to rebuild it. If you don’t have the luxury of time, and patience, without build caching functionality of docker this process can become annoying. Caching sort of keeps a copy of the image layers that do not change and only updates changes to the layers that change. Rememember layers form above? Read more on caching here.

Let’s take a look. Running docker ps shows that I don’t have any containers or images,

But when I try to build an image I built before, it takes less than 10 seconds.

From the image, you can see CHACHED on several lines. This is so important and goes back to why we separate commands. Given that each command adds a layer, we only want things that change to be on the newer layers, like for example your source code files. That’s why we start with instructions that won’t change often, like installing dependencies, then copy the source code files towards the end.

When building a Docker image, you want to make sure to keep it light while also not sacrificing speed, security and reliability. Avoiding large images speeds up building and deploying containers but some might serve you application faster and save you trouble of manually installing other dependencies. Therefore, it is crucial to play around with several base images and see which one serves all your requirements.

Default to the smaller Base image in Production

To create a Docker image, you need a base on which you can install and add components, as needed. You can download an existing parent image and use it as the base of your own image or build one from scratch.

You install a variation of an operating system as the base of an image. The OS base can drastically impact the size of your final Docker image, which is why deciding on the right one plays a significant role.

Use a .dockerignore file

Remember .gitignore? This is a Docker’s implementation of that. Files added here will be excluded from your container image. This is of course important as it helps you not to accidentally share your security credentials and private data in your container images.

Utilize the Multi-Stage Builds Feature in Docker

With the multi-stage feature, you avoid adding unnecessary layers, which has a considerable impact on the overall image size. You have one dockerfile with multiple FROM statements like show below;

# syntax=docker/dockerfile:1

FROM golang:1.16
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go ./
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -o app .

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app ./
CMD ["./app"]d

Read more on multi-stage builds here.

Avoid Adding Unnecessary Layers to Reduce Docker Image Size

A Docker image takes up more space with every layer you add to it. Therefore, the more layers you have, the more space the image requires.

Each RUN instruction in a Dockerfile adds a new layer to your image. That is why you should try to do file manipulation inside a single RUN command. Also, combine different commands into one instruction using the && option.

For instance, you can update the repository and install multiple packages in a single RUN instruction. To get a clear, comprehensive line, use the backslash (\) to type out the command in multiple lines.

Apart from updating and installing the packages, you should also clean up apt cache with && rm -rf /var/lib/apt/lists/* to save up some more space.

RUN apt-get update && apt-get install -y\
[package-one] \
[package-two]
&& rm -rf /var/lib/apt/lists/*

Beware of Updates and Unnecessary Packages and Dependencies

Another way to save space and keep your Docker image small and secure is to ensure you are running the latest version of the platform you are building on.

By having the newest version, you avoid extensive updates that download countless rpm packages and take up a lot of space and icrease your attack surface.

Note: If you need to update, make sure to clean up the rpm cache and add the dnf clean all option: RUN dnf -y update && dnf clean all.

Installing a package also often includes downloading dependencies on which the software relies on. However, sometimes the download will also store packages that are not required but rather are recommended.

Such unwanted packages can add up and consume a lot of disk space. To download only the main dependencies, add the --no-install-recommends option to the install command.

For example:

RUN apt-get install --no-install-recommends [package-one]

Conclusion

Containerization is at the heart of DevOps. Docker containers support the implementation of CI/CD in development. Image size and build efficiency are important factors when overseeing and working with the microservice architecture. This is why you should understand containers and how they work in depth.

That’s it for this one. If you have any questions or would like me to clarify anything in this article, feel free to drop a comment below and I will reach out. Stay safe, drink lots of water and tell your daughter, mama, wife, grandad, grandma, husband, son …… you love them.

If you love this article and would like me to continue to produce more exciting content, please consider tipping using the tip button below. Domo Arriggatou🫂.

--

--