Introduction

Overview

Teaching: 5 min
Exercises: 5 min

Questions

What are containers?

What is Docker and Podman? What are the differences?

Objectives

Learn the basic concepts on containerization.

Understand how a container helps with analysis reproducibility.

What is a Container?

Have you ever:

Taken a piece of software from one computer to another and found that it doesn’t work?
Had to install a bunch of dependencies to run a piece of software written by a colleague?
What about saying “it works on my machine” when someone else is having trouble running your code?

We often experience it, and it’s a common problem in software development and data analysis. The industry has been working on solutions to these problems for a long time, and containers are one of the most popular solutions.

Containers are a way to package software that allows you to run an application and its dependencies in a single, isolated unit called a container.

Importantly, containers share the host machine’s OS system kernel and so don’t require an OS per application. As discrete processes containers take up only as much memory as necessary, making them very lightweight and fast to spin up to run:

*Container-based architecture vs virtual machines*

Containers on Windows and macOS

Running containers in systems other than Linux, like macOS and Windows, require a virtual machine on the background to emulate the Linux kernel. Still, the containers are very lightweight and faster to spin up than to deploy one virtual machine for each application.

Most containerization tools provide a seamless experience for the user, abstracting the virtual machine and making it transparent to the user. Just be aware that there is an additional layer between the containers and the host machine.

Docker

Podman logo

Docker is perhaps the most popular containerization tool these days, particularly in industry. It is a platform for developing, shipping, and running applications in containers. In addition, Docker provides a public registry for sharing and collaborating on container images called Docker Hub.

The official Docker documentation and tutorial can be found on the Docker website. It is quite thorough and useful, and an excellent guide that should be routinely visited when working with Docker. A note up front, Docker has very similar syntax to Git and Linux, so if you are familiar with the command line tools for them then most of Docker should seem somewhat natural (though you should still read the docs!).

It is still important to know what Docker is and what the components of it are. Docker images are executables that bundle together all necessary components for an application or an environment. Docker containers are the runtime instances of images — they are images with a state.

Docker is the most popular containerization tool these days, but it’s not the only one. There are other kids on the block which are in use and gaining popularity, such as Podman.

Why Podman?

Docker logo

Podman is an open-source alternative to Docker with several advantages. For example, Podman is able to run containers as a non-root user out of the box, a big security advantage over Docker. The reason is Podman uses a servless architecture, which means that it doesn’t require a daemon running as superuser to execute containers as Docker does.

In addition, Docker Desktop has licensing restrictions that may prevent you from using it in some institutions. If that is your case, Podman is an excellent alternative.

Podman is a drop-in replacement for Docker, so you can use the same commands and workflows you are used to with Docker. Across the tutorial, we will use Podman as the containerization tool, but if you are interested in using Docker instead, just replace podman by docker in the commands and you should be good to go.

Apptainer

Apptainer (formerly known as Singularity) is another containerization technology. In particular is used widely in HPC, and it is gaining rapid adoption on High Energy and Nuclear Physics, so you may have need to familiarize yourself with it at some point.

To learn more about Apptainer, see the HSF Training Module “Introduction to Apptainer/Singularity” which includes also more details about the difference between Apptainer and Docker.

Key Points

Introduces Docker- a popular tool for software containerization.

Introduces Podman- an open-source alternative with several advantages.

Podman is a drop-in replacement for Docker. Replace podman by docker in the commands and you are good to go.

Pulling Images

Overview

Teaching: 10 min
Exercises: 5 min

Questions

How are images downloaded?

How are images distinguished?

Objectives

Pull images from Docker Hub image registry

List local images

Introduce image tags

Docker Hub

Much like how GitHub allows for web hosting and searching for code, the Docker Hub image registry allows the same for Docker images. Hosting images is free for public repositories and allows for downloading images as they are needed.

Additionally, through integrations with GitHub and Bitbucket, Docker Hub repositories can be linked against Git repositories so that automated builds of Dockerfiles on Docker Hub will be triggered by pushes to repositories. However, at this moment enabling such a feature requires a Pro (paid) account or joining the Docker-Sponsored Open Source Program. There are other ways of doing this, such as using GitLab/GitHub CI/CD, but that’s beyond the scope of this training module.

Docker Hub and Podman

Both Docker and Podman use OCI (Open Container Initiative) compliant images, so you can use the same images with both tools. It means Podman can pull and run images from Docker Hub.

By default, podman pull pulls an image from Docker Hub if a registry is not specified in the command line argument.

Pulling Images

To begin with we’re going to pull down the image we’re going to be working in for the tutorial (note: if you did all the docker pulls in the setup instructions, this image will already be on your machine, in which case podman should notice it’s there and not attempt to re-pull it unless it’s changed in the meantime):

podman pull matthewfeickert/intro-to-docker

No search registry defined

Some installations of Podman may end with an error like Error: unable to pull matthewfeickert/intro-to-docker:latest: unable to find registry in the system. This is because the default registry is not defined. You can fix this by adding the docker.io registry at the command:
podman pull docker.io/matthewfeickert/intro-to-docker
Or, to pull images by default from Docker Hub, adding the following line to the /etc/containers/registries.conf file:
unqualified-search-registries=["docker.io"]

Connection errors

If using Podman or Docker in a non-Linux machine you run into an error like Error: unable to connect to Podman, make sure that the Podman or Docker desktop application is running.

Remember that in such environments, Podman or Docker use a virtual machine to run the containers.

and then list the images that we have available to us locally

podman images

If you have many images and want to get information on a particular one you can apply a filter, such as the repository name

podman images matthewfeickert/intro-to-docker

REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
matthewfeickert/intro-to-docker   latest              cf6508749ee0        3 months ago        1.49GB

or more explicitly

podman images --filter=reference="matthewfeickert/intro-to-docker"

REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
matthewfeickert/intro-to-docker   latest              cf6508749ee0        3 months ago        1.49GB

You can see here that there is the TAG field associated with the matthewfeickert/intro-to-docker image. Tags are a way of further specifying different versions of the same image. As an example, let’s pull the buster release tag of the Debian image (again, if it was already pulled during setup, podman won’t attempt to re-pull it unless it’s changed since last pulled).

podman pull debian:buster
podman images debian

buster: Pulling from library/debian
<some numbers>: Pull complete
Digest: sha256:<the relevant SHA hash>
Status: Downloaded newer image for debian:buster
docker.io/library/debian:buster

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
debian              buster              00bf7fdd8baf        5 weeks ago         114MB

Check the documentation on pull and images for more information on these commands.

Pulling Python

Pull the image python:3.9-slim for Python 3.9 and then list all python images on your computer.

Browse the official Python images to find available tags and read about image variants. What does -slim mean?
Solution
podman pull python:3.9-slim
podman images --filter=reference="python"
REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
docker.io/library/python          3.9-slim            e440e2151380        2 weeks ago        131 MB
python:<version>-slim: This image does not contain the common packages contained in the default tag and only contains the minimal packages needed to run python

Key Points

Pull images with podman pull <image-id>

List all images on the computer and other information with podman images

Image tags distinguish releases or version and are appended to the image name with a colon

Running Containers

Overview

Teaching: 15 min
Exercises: 5 min

Questions

How are containers run?

How do you monitor containers?

How are containers exited?

How are containers restarted?

Objectives

Run containers

Understand container state

Stop and restart containers

To use an image as a particular instance on a host machine, you run it as a container. You can run in either a detached or foreground (interactive) mode.

Run the image we pulled as a container with an interactive bash terminal:

podman run -it matthewfeickert/intro-to-docker:latest /bin/bash

The -i option here enables the interactive session, the -t option gives access to a terminal and the /bin/bash command makes the container start up in a bash session.

You are now inside the container in an interactive bash session. Check the file directory

pwd

/home/docker/data

and check the host to see that you are not in your local host system

hostname

<generated hostname>

Further, check the os-release to see that you are actually inside a release of Debian (given the Docker Library’s Python image Dockerfile choices)

cat /etc/os-release

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Working directory

You may be wondering why you are at /home/docker/data inside the container. This is the working directory that was set for the image.

In the next chapters we will see how to build your own images and set parameters such as the working directory.

Monitoring Containers

Open up a new terminal tab on the host machine and list the containers that are currently running:

podman ps

CONTAINER ID        IMAGE         COMMAND             CREATED             STATUS              PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago       Up n minutes                            <generated name>

Notice that the name of your container is some randomly generated name. To make the name more helpful, rename the running container

podman rename <CONTAINER ID> my-example

and then verify it has been renamed

podman ps

CONTAINER ID        IMAGE         COMMAND             CREATED             STATUS              PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago       Up n minutes                            my-example

Renaming by name

You can also identify containers to rename by their current name
podman rename <NAME> my-example

Alternatively, you can also give the container a name at creation, using the --name option:

podman run -it --name my-fancy-name matthewfeickert/intro-to-docker:latest /bin/bash

This way, it has a custom chosen name to start with, which you can use later on to interact with it.

Exiting and restarting containers

As a test, go back into the terminal used for your container, and create a file in the container

touch test.txt

In the container exit at the command line

exit

You are returned to your shell. If you list the containers you will notice that none are running

podman ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

but you can see all containers that have been run and not removed with

podman ps -a

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago      Exited (0) t seconds ago                       my-example

To restart your exited container start it again and then attach it interactively to your shell

podman start <CONTAINER ID>
podman attach <CONTAINER ID>

exec command

The attach command used here is a handy shortcut to interactively access a running container with the same start command (in this case /bin/bash) that it was originally run with.

In case you’d like some more flexibility, the exec command lets you run any command in the container, with options similar to the run command to enable an interactive (-i) session, etc.

For example, the exec equivalent to attaching in our case would look like:
podman start <CONTAINER ID>
podman exec -it <CONTAINER ID> /bin/bash

Starting and attaching by name

You can also start and attach containers by their name
podman start <NAME>
podman attach <NAME>

Notice that your entry point is still /home/docker/data and then check that your test.txt still exists

ls

test.txt

So this shows us that we can exit containers for arbitrary lengths of time and then return to our working environment inside of them as desired.

Clean up a container

If you want a container to be cleaned up — that is, deleted — after you exit it then run with the --rm option flag
podman run --rm -it <IMAGE> /bin/bash

Key Points

Run containers with podman run <image-id>

Monitor containers with podman ps

Exit interactive sessions using the exit command

Restart stopped containers with podman start

File I/O with Containers

Overview

Teaching: 15 min
Exercises: 5 min

Questions

How do containers interact with my local file system?

Objectives

Copy files to and from the container

Mount directories to be accessed and manipulated by the container

Copying

Copying files between the local host and containers is possible. On your local host, either find a file that you want to transfer to the container or create a new one. Below is the procedure for creating a new file called io_example.txt and then copying it to the container:

touch io_example.txt
echo "This was written on local host" > io_example.txt
podman cp io_example.txt <NAME>:/home/docker/data/

and then from the container check and modify it in some way

pwd
ls
cat io_example.txt
echo "This was written inside the container" >> io_example.txt

Permission issues

If you run into a Permission denied error, there is a simple and quick fix to continue with the exercise:
exit  # exit container
chmod a+w io_example.txt  # add write permissions for all users
And continue from the podman cp ... command above.

/home/docker/data
io_example.txt
This was written on local host

and then on the local host copy the file out of the container

podman cp <NAME>:/home/docker/data/io_example.txt .

and verify if the file has been modified as you wanted

cat io_example.txt

This was written on local host
This was written inside the container

Volume mounting

What is more common and arguably more useful is to mount volumes to containers with the -v flag. This allows for direct access to the host file system inside the container and for container processes to write directly to the host file system.

podman run -v <path on host>:<path in container> <image>

For example, to mount your current working directory ($PWD) on your local machine to the data directory in the example container

podman run --rm -it -v $PWD:/home/docker/data matthewfeickert/intro-to-docker

No such file or directory?

On Windows and macOS, you may face an error while mounting the volume: Error: statfs <directory>: no such file or directory.

The error occurs because the directory you are trying to mount was not shared with the virtual machine that runs the containers. In latest versions of Podman and Docker your home directory is shared by default, but with Podman you can restart the machine to ensure that the directory is mounted:
podman machine stop
podman machine start
Starting machine "podman-machine-default"
Waiting for VM ...
Mounting volume... /Users:/Users
...
Machine "podman-machine-default" started successfully

From inside the container you can ls to see the contents of your directory on your local machine

ls

and yet you are still inside the container

pwd

/home/docker/data

You can also see that any files created in this path in the container persist upon exit

touch created_inside.txt
exit
ls *.txt

Permission issues

If you are using Linux with SELinux enabled, you might run into a Permission denied error. Note that SELinux is enabled if the output of the command getenforce status is Enforcing. To fix the permission issue, append :z (lowercase!) at the end of the mount option, like this:
podman run --rm -it -v $PWD:/home/docker/data:z ...
If this still does not fix the issue you can disable SELinux by running sudo setenforce 0, or you can try using sudo to execute docker/podman commands, but neither of these methods is recommended.

created_inside.txt

This I/O allows for container images to be used for specific tasks that may be difficult to do with the tools or software installed on only the local host machine. For example, debugging problems with software that arise on cross-platform software, or even just having a specific version of software perform a task (e.g., using Python 2 when you don’t want it on your machine, or using a specific release of TeX Live when you aren’t ready to update your system release).

Key Points

Copy files with podman cp

Mount volumes with podman run -v <path on host>:<path in container> <image>

Coffee break

Overview

Teaching: 0 min
Exercises: 15 min

Questions

Coffee or tea?

Objectives

Refresh your mental faculties with coffee and conversation

Key Points

Breaks are helpful in the service of learning

Writing Dockerfiles and Building Images

Overview

Teaching: 30 min
Exercises: 10 min

Questions

How are Dockerfiles written?

How are images built?

Objectives

Write simple Dockerfiles

Build a container image from a Dockerfile

Container images are static files that contain a template to create containers on machines. Container engines like Podman or Docker pull the images from repositories or local storage and then create containers from them. Container engines can also build and save to a repository new container images, interactively or following a set of instructions, starting from scratch or modifying an existing image.

A common way of defining the instructions to build a container image is through a Dockerfile. These text based documents provide the instructions through an API similar to the Linux operating system commands to execute commands during the build. The Dockerfile for the example image being used is an example of some simple extensions of the official Python 3.9 Docker image based on Debian Bullseye (python:3.9-bullseye).

Like Docker, Podman also uses Dockerfiles to build images, so the same instructions can be used for both tools. We will continue with Podman throughout this lesson but the same commands can be used with Docker.

As a very simple example of extending the example image into a new image create a Dockerfile on your local machine

touch Dockerfile

and then write in it the Docker engine instructions to add cowsay and scikit-learn to the environment

# Dockerfile

# Specify the base image that we're building the image on top of
FROM matthewfeickert/intro-to-docker:latest

# Build the image as root user
USER root

# Run some bash commands to install packages
RUN apt-get -y update && \
    apt-get -y upgrade && \
    apt-get -y install cowsay && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt-get/lists/* && \
    ln -s /usr/games/cowsay /usr/bin/cowsay
RUN pip install --no-cache-dir -q scikit-learn

# This sets the default working directory when a container is launched from the image
WORKDIR /home/docker

# Run as docker user by default when the container starts up
USER docker

Dockerfile layers (or: why all these ‘&&’s??)

Each RUN command in a Dockerfile creates a new layer to the image. In general, each layer should try to do one job and the fewer layers in an image the easier it is compress.

This is why you see all these ‘&& 's in the RUN command, so that all the shell commands will run in a pipeline and will take place in a single layer When trying to upload and download images on demand the smaller the size the better.

Another thing to keep in mind is that each RUN command occurs in its own shell, so any environment variables, etc. set in one RUN command will not persist to the next.

Garbage cleanup

Notice that the last few lines of the RUN command clean up and remove unneeded files that get produced during the installation process. This is important for keeping image sizes small, since files produced during each image-building layer will persist into the final image and add unnecessary bulk.

Don’t run as root

By default Docker containers will run as root. This is a bad idea and a security concern. Instead, setup a default user (like docker in the example) and if needed give the user greater privileges.

Then build an image from the Dockerfile with Podman and tag it with a human-readable name

podman build -f Dockerfile -t extend-example:latest .

You can now run the image as a container and verify for yourself that your additions exist

podman run --rm -it extend-example:latest /bin/bash
which cowsay
cowsay "Hello from inside the container"
pip list | grep scikit
python3 -c "import sklearn as sk; print(sk)"

/usr/bin/cowsay
 ___________________
< Hello from inside the container >
 -------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

scikit-learn        1.3.1
<module 'sklearn' from '/usr/local/lib/python3.9/site-packages/sklearn/__init__.py'>

You can list all images available on your local machine with podman images:

podman images

REPOSITORY                                 TAG            IMAGE ID      CREATED       SIZE
localhost/extend-example                   latest         c24a757fabe7  8 hours ago   2.2 GB
docker.io/matthewfeickert/intro-to-docker  latest         64708e04f3a9  2 years ago   1.62 GB
...

docker.io indicates that the image was pulled from the Docker Hub, while localhost indicates that the image was built locally.

Add your own tag

Using podman tag add a new tag to the image you built.

Solution

podman images extend-example
podman tag extend-example:latest extend-example:my-tag
podman images extend-example

REPOSITORY                TAG         IMAGE ID      CREATED      SIZE
localhost/extend-example  latest      c24a757fabe7  9 hours ago  2.2 GB

REPOSITORY                TAG         IMAGE ID      CREATED      SIZE
localhost/extend-example  my-tag      c24a757fabe7  9 hours ago  2.2 GB
localhost/extend-example  latest      c24a757fabe7  9 hours ago  2.2 GB

Tags are labels

Note how the image ID didn’t change for the two tags: they are the same object. Tags are simply convenient human-readable labels.

`COPY`

Podman also gives you the ability to copy external files into a container image during the build with the COPY Dockerfile command. Which allows copying a target file from a host file system into the image file system

COPY <path on host> <path in container image>

For example, if there is a file called install_python_deps.sh in the same directory as the build is executed from

touch install_python_deps.sh

with contents

cat install_python_deps.sh

#!/usr/bin/env bash

set -e

pip install --upgrade --no-cache-dir pip setuptools wheel
pip install --no-cache-dir -q scikit-learn

then this could be copied into the container image of the previous example during the build and then used (and then removed as it is no longer needed).

Create a new file called Dockerfile.copy:

touch Dockerfile.copy

and fill it with a modified version of the above Dockerfile, where we now copy install_python_deps.sh from the local working directory into the container and use it to install the specified python dependencies:

# Dockerfile.copy
FROM matthewfeickert/intro-to-docker:latest
USER root
RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -qq -y install cowsay && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt-get/lists/* && \
    ln -s /usr/games/cowsay /usr/bin/cowsay
COPY install_python_deps.sh install_python_deps.sh
RUN bash install_python_deps.sh && \
    rm install_python_deps.sh
WORKDIR /home/data
USER docker

podman build -f Dockerfile.copy -t copy-example:latest .

For very complex scripts or files that are on some remote, COPY offers a straightforward way to bring them into the container image build.

Key Points

Dockerfiles are written as text file commands to the container engine

Images are built with podman build

Images can have multiple tags associated to them

Images can use COPY to copy files into them during build

Removal of Containers and Images

Overview

Teaching: 5 min
Exercises: 5 min

Questions

How do you cleanup old containers?

How do you delete images?

Objectives

Learn how to cleanup after working with containers

You can cleanup/remove a container with podman rm

podman rm <CONTAINER NAME>

Remove old containers

Start an instance of the tutorial container, exit it, and then remove it with podman rm

Solution

podman run matthewfeickert/intro-to-docker:latest
podman ps -a
podman rm <CONTAINER NAME>
podman ps -a

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n seconds ago      Exited (0) t seconds ago                       <name>

<generated id>

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES

You can remove an image from your computer entirely with podman rmi

podman rmi <IMAGE ID>

Remove an image

Pull down the Python 2.7 image (2.7-slim tag) from Docker Hub and then delete it.

Solution

podman pull python:2.7-slim
podman images python
podman rmi <IMAGE ID>
podman images python

2.7: Pulling from library/python
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
Digest: sha256:<the relevant SHA hash>
Status: Downloaded newer image for python:2.7-slim
docker.io/library/python:2.7-slim

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              2.7-slim            d75b4eed9ada        14 hours ago        886MB
python              3.9-slim            e440e2151380        23 hours ago        918MB

Untagged: python@sha256:<the relevant SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              3.9-slim            e440e2151380        23 hours ago        918MB

Helpful cleanup commands

What is helpful is to have a command to detect and remove unwanted images and containers for you. This can be done with prune, which depending on the context will remove different things.

podman container prune removes all stopped containers, which is helpful to clean up forgotten stopped containers.

podman image prune removes all unused or dangling images (images that do not have a tag). This is helpful for cleaning up after builds.

podman system prune removes all stopped containers, dangling images, and dangling build caches. This is very helpful for cleaning up everything all at once.

Key Points

Remove containers with podman rm <CONTAINER NAME>

Remove images with podman rmi <IMAGE ID>

Perform faster cleanup with podman container prune, podman image prune, and podman system prune

Using CMD and ENTRYPOINT in Dockerfiles

Overview

Teaching: 15 min
Exercises: 10 min

Questions

How are default commands set in Dockerfiles?

Objectives

Learn how and when to use CMD

Learn how and when to use ENTRYPOINT

So far every time we’ve run the containers we’ve typed

podman run --rm -it <IMAGE>:<TAG> <command>

podman run --rm -it python:3.9-slim /bin/bash

Running this dumps us into a Bash session

echo $SHELL

SHELL=/bin/bash

However, if no /bin/bash is given then you are placed inside the Python 3.9 REPL.

podman run --rm -it python:3.9-slim

Python 3.9.18 (main, Feb 13 2024, 10:56:47)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

These are very different behaviors, so let’s understand what is happening.

The Python 3.9 image has a default command that runs when the container is executed, which is specified in the Dockerfile with CMD.

Create a file named Dockerfile.defaults

touch Dockerfile.defaults

# Dockerfile.defaults
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-slim
FROM ${BASE_IMAGE}
USER root
RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt-get/lists/*
# Create user "docker"
RUN useradd -m docker && \
    cp /root/.bashrc /home/docker/ && \
    mkdir /home/docker/data && \
    chown -R --from=root docker /home/docker
ENV HOME /home/docker
WORKDIR ${HOME}/data
USER docker

CMD ["/bin/bash"]

Now build the dockerfile, specifying its name with the -f argument since the engine will otherwise look for a file named Dockerfile by default.

podman build -f Dockerfile.defaults -t defaults-example:latest .

Now running

podman run --rm -it defaults-example:latest

again drops you into a Bash shell as specified by CMD. As has already been seen, CMD can be overridden by giving a command after the image

podman run --rm -it defaults-example:latest python3

The ENTRYPOINT builder command allows to define a command or commands that are always run at the “entry” to the container. If an ENTRYPOINT has been defined then CMD provides optional inputs to the ENTRYPOINT.

Create a file named entrypoint.sh

# entrypoint.sh
#!/usr/bin/env bash

set -e

function main() {
    if [[ $# -eq 0 ]]; then
        printf "\nHello, World!\n"
    else
        printf "\nHello %s\n" "${1}"
    fi
}

main "$@"

/bin/bash

And now modify the Dockerfile.defaults to use the entrypoint.sh script

# Dockerfile.defaults
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-slim
FROM ${BASE_IMAGE}
USER root
RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt-get/lists/*
# Create user "docker"
RUN useradd -m docker && \
    cp /root/.bashrc /home/docker/ && \
    mkdir /home/docker/data && \
    chown -R --from=root docker /home/docker
ENV HOME /home/docker
WORKDIR ${HOME}/data
USER docker

COPY entrypoint.sh $HOME/entrypoint.sh
ENTRYPOINT ["/bin/bash", "/home/docker/entrypoint.sh"]
CMD ["there"]

Note how CMD provides an optional input to entrypoint.sh.

podman build -f Dockerfile.defaults -t defaults-example:latest --compress .

So now try

podman run --rm -it defaults-example:latest

Applied ENTRYPOINT and CMD

What will be the output of
podman run --rm -it defaults-example:latest $USER
and why?
Solution
Hello <your user name>
docker@2a99ffabb512:~/data$
$USER is evaluated and then overrides the default CMD to be passed to entrypoint.sh

All about ENTRYPOINT and CMD

ENTRYPOINT and CMD can be both in “exec” or “shell” form, although we recommend to use exec form. Exec form must be an array of comma separated quoted arguments and it us executed via the Linux execv(). E.g. CMD ["/usr/bin/ls", "-al"] Anything else, also if you forget just the quotes, will be considered shell form, it is passed by Docker/Podman to /bin/sh -c (as written, with quotes, parentheses, …), and can use shell features like PATH and expansion. E.g. CMD ls -al

At execution, ENTRYPOINT can be overridden with the --entrypoint option, CMD with any argument of the invocation. When ENTRYPOINT is in exec form, CMD or the invocatipon arguments are passed as additional arguments (as single string, with additional “/bin/sh” “-c” arguments if CMD is in shell form). When ENTRYPOINT is in shell form, CMD and invocation arguments are ignored.

An interactive session, run -it, is possible only if the last command (ENTRYPOINT if present, arguments or CMD) is interactive, i.e. not terminating.

The use case seen above is common for application containers: ENTRYPOINT (in exec form) is used for the command and CMD for is ued the default arguments that can be easily overridden at invocation.

Another common use case is to run an initialization script before anything else in the container, e.g. to download files or set variables only available at run-time, or to get secrets from a key-store. For that you can use an entrypoint.sh like:

#!/bin/sh
echo "You are running on $(hostname)"
# download tokens and recrets
export MY_TOKEN=./token_file.jwt
bash -c "$*"

The last line is the key to treat the arguments in CMD or the command line as commands. Remember to set entrypoint.sh as executable and to use the exec form for ENTRYPOINT (ENTRYPOINT ["./entrypoint.sh"]) Note that if the file to download or value of the variable are known when building the image, you can use the RUN command in the Dockerfile instead, which is more efficient than the entrypoint script.

Key Points

CMD provide defaults for an executing container

CMD can provide options for ENTRYPOINT

ENTRYPOINT allows you to configure commands that will always run for an executing container

Bonus Episode: Building and deploying a Docker container to Github Packages

Overview

Teaching: 40 min
Exercises: 0 min

Questions

How to build a Docker container for python packages?

How to share Docker images?

Objectives

To be able to build a Docker container and share it via GitHub packages

Prerequisites

For this lesson, you will need,

Knowledge of Git SW Carpentry Git-Novice Lesson

Knowledge of GitHub CI/CD HSF Github CI/CD Lesson

Docker Container for python packages

Python packages can be installed using a Docker image. The following example illustrates how to write a Dockerfile for building an image containing python packages.

FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update \
 && apt-get install wget -y \
 && apt-get install dpkg-dev cmake g++ gcc binutils libx11-dev libxpm-dev \
  libxft-dev libxext-dev python3 libssl-dev libgsl0-dev libtiff-dev \
  python3-pip -y

 RUN pip3 install numpy \
  && pip3 install awkward \
  && pip3 install uproot4 \
  && pip3 install particle \
  && pip3 install hepunits \
  && pip3 install matplotlib \
  && pip3 install mplhep \
  && pip3 install vector \
  && pip3 install fastjet \
  && pip3 install iminuit

As we see, several packages are installed.

It is possible to publish Docker images with GitHub packages. To do so, one needs to use GitHub CI/CD. A step-by-step guide is presented here.

Step 1: Create a GitHub repository and clone it locally.
Step 2: In the empty repository, make a folder called .github/workflows. In this folder we will store the file containing the YAML script for a GitHub workflow, named Docker-build-deploy.yml (the name doesn’t really matter).
Step 3: In the top directory of your GitHub repository, create a file named Dockerfile.
Step 4: Copy-paste the content above and add it to the Dockerfile. (In principle it is possible to build this image locally, but we will not do that here, as we wish to build it with GitHub CI/CD).
Step 5: In the Docker-build-deploy.yml file, add the following content:

name: Create and publish a Docker image

on:
  push:
    branches:
      - main
      - master

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push-image:
    runs-on: ubuntu-latest

    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Log in to the Container registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Docker Metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

The above script is designed to build and publish a Docker image with GitHub packages.

Step 6: Add LICENSE and README as recommended in the SW Carpentry Git-Novice Lesson, and then the repository is good to go.

Key Points

Python packages can be installed in Docker images along with ubuntu packages.

It is possible to publish and share Docker images over github packages.

Introduction to Docker and Podman

Introduction

Overview

What is a Container?

Containers on Windows and macOS

Docker

Why Podman?

Apptainer

Key Points

Pulling Images

Overview

Docker Hub

Docker Hub and Podman

Pulling Images

No search registry defined

Connection errors

Pulling Python

Solution

Key Points

Running Containers

Overview

Working directory

Monitoring Containers

Renaming by name

Exiting and restarting containers

exec command

Starting and attaching by name

Clean up a container

Key Points

File I/O with Containers

Overview

Copying

Permission issues

Volume mounting

No such file or directory?

Permission issues

Key Points

Coffee break

Overview

Key Points

Writing Dockerfiles and Building Images

Overview

Dockerfile layers (or: why all these ‘&&’s??)

Garbage cleanup

Don’t run as root

Tags

Add your own tag

Solution

Tags are labels

COPY

Key Points

Removal of Containers and Images

Overview

Remove old containers

Solution

Remove an image

Solution

Helpful cleanup commands

Key Points

Using CMD and ENTRYPOINT in Dockerfiles

Overview

Applied ENTRYPOINT and CMD

Solution

All about ENTRYPOINT and CMD

Key Points

Bonus Episode: Building and deploying a Docker container to Github Packages

Overview

Prerequisites

Docker Container for python packages

Publish Docker images with GitHub Packages and share them!

Key Points

`exec` command

Don’t run as `root`

`COPY`

Applied `ENTRYPOINT` and `CMD`

All about `ENTRYPOINT` and `CMD`