Introduction
Overview
Teaching: 5 min
Exercises: 5 minQuestions
What are containers?
What is Docker and Podman? What are the differences?
Objectives
Learn the basic concepts on containerization.
Understand how a container helps with analysis reproducibility.
What is a Container?
Have you ever:
- Taken a piece of software from one computer to another and found that it doesn’t work?
- Had to install a bunch of dependencies to run a piece of software written by a colleague?
- What about saying “it works on my machine” when someone else is having trouble running your code?
We often experience it, and it’s a common problem in software development and data analysis. The industry has been working on solutions to these problems for a long time, and containers are one of the most popular solutions.
Containers are a way to package software that allows you to run an application and its dependencies in a single, isolated unit called a container.
Importantly, containers share the host machine’s OS system kernel and so don’t require an OS per application. As discrete processes containers take up only as much memory as necessary, making them very lightweight and fast to spin up to run:
Containers on Windows and macOS
Running containers in systems other than Linux, like macOS and Windows, require a virtual machine on the background to emulate the Linux kernel. Still, the containers are very lightweight and faster to spin up than to deploy one virtual machine for each application.
Most containerization tools provide a seamless experience for the user, abstracting the virtual machine and making it transparent to the user. Just be aware that there is an additional layer between the containers and the host machine.
Docker
Docker is perhaps the most popular containerization tool these days, particularly in industry. It is a platform for developing, shipping, and running applications in containers. In addition, Docker provides a public registry for sharing and collaborating on container images called Docker Hub.
The official Docker documentation and tutorial can be found on the Docker website. It is quite thorough and useful, and an excellent guide that should be routinely visited when working with Docker. A note up front, Docker has very similar syntax to Git and Linux, so if you are familiar with the command line tools for them then most of Docker should seem somewhat natural (though you should still read the docs!).
It is still important to know what Docker is and what the components of it are. Docker images are executables that bundle together all necessary components for an application or an environment. Docker containers are the runtime instances of images — they are images with a state.
Docker is the most popular containerization tool these days, but it’s not the only one. There are other kids on the block which are in use and gaining popularity, such as Podman.
Why Podman?
Podman is an open-source alternative to Docker with several advantages. For example, Podman is able to run containers as a non-root user out of the box, a big security advantage over Docker. The reason is Podman uses a servless architecture, which means that it doesn’t require a daemon running as superuser to execute containers as Docker does.
In addition, Docker Desktop has licensing restrictions that may prevent you from using it in some institutions. If that is your case, Podman is an excellent alternative.
Podman is a drop-in replacement for Docker, so you can use the same commands and workflows you are used to with Docker.
Across the tutorial, we will use Podman as the containerization tool, but
if you are interested in using Docker instead, just replace podman
by docker
in the commands and you should be good to go.
Apptainer
Apptainer (formerly known as Singularity) is another containerization technology. In particular is used widely in HPC, and it is gaining rapid adoption on High Energy and Nuclear Physics, so you may have need to familiarize yourself with it at some point.
To learn more about Apptainer, see the HSF Training Module “Introduction to Apptainer/Singularity” which includes also more details about the difference between Apptainer and Docker.
Key Points
Introduces Docker- a popular tool for software containerization.
Introduces Podman- an open-source alternative with several advantages.
Podman is a drop-in replacement for Docker. Replace
podman
bydocker
in the commands and you are good to go.
Pulling Images
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How are images downloaded?
How are images distinguished?
Objectives
Pull images from Docker Hub image registry
List local images
Introduce image tags
Docker Hub
Much like how GitHub allows for web hosting and searching for code, the Docker Hub image registry allows the same for Docker images. Hosting images is free for public repositories and allows for downloading images as they are needed.
Additionally, through integrations with GitHub and Bitbucket, Docker Hub repositories can be linked against Git repositories so that automated builds of Dockerfiles on Docker Hub will be triggered by pushes to repositories. However, at this moment enabling such a feature requires a Pro (paid) account or joining the Docker-Sponsored Open Source Program. There are other ways of doing this, such as using GitLab/GitHub CI/CD, but that’s beyond the scope of this training module.
Docker Hub and Podman
Both Docker and Podman use OCI (Open Container Initiative) compliant images, so you can use the same images with both tools. It means Podman can pull and run images from Docker Hub.
By default,
podman pull
pulls an image from Docker Hub if a registry is not specified in the command line argument.
Pulling Images
To begin with we’re going to pull down the image we’re going to be working in for the tutorial (note: if you did all the docker pulls in the setup instructions, this image will already be on your machine, in which case podman should notice it’s there and not attempt to re-pull it unless it’s changed in the meantime):
podman pull matthewfeickert/intro-to-docker
No search registry defined
Some installations of Podman may end with an error like
Error: unable to pull matthewfeickert/intro-to-docker:latest: unable to find registry in the system
. This is because the default registry is not defined. You can fix this by adding thedocker.io
registry at the command:podman pull docker.io/matthewfeickert/intro-to-docker
Or, to pull images by default from Docker Hub, adding the following line to the
/etc/containers/registries.conf
file:unqualified-search-registries=["docker.io"]
Connection errors
If using Podman or Docker in a non-Linux machine you run into an error like
Error: unable to connect to Podman
, make sure that the Podman or Docker desktop application is running.Remember that in such environments, Podman or Docker use a virtual machine to run the containers.
and then list the images that we have available to us locally
podman images
If you have many images and want to get information on a particular one you can apply a filter, such as the repository name
podman images matthewfeickert/intro-to-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
matthewfeickert/intro-to-docker latest cf6508749ee0 3 months ago 1.49GB
or more explicitly
podman images --filter=reference="matthewfeickert/intro-to-docker"
REPOSITORY TAG IMAGE ID CREATED SIZE
matthewfeickert/intro-to-docker latest cf6508749ee0 3 months ago 1.49GB
You can see here that there is the TAG
field associated with the
matthewfeickert/intro-to-docker
image.
Tags are a way of further specifying different versions of the same image.
As an example, let’s pull the buster release tag of the
Debian image (again, if it was already pulled during setup, podman won’t attempt to re-pull it unless it’s changed since last pulled).
podman pull debian:buster
podman images debian
buster: Pulling from library/debian
<some numbers>: Pull complete
Digest: sha256:<the relevant SHA hash>
Status: Downloaded newer image for debian:buster
docker.io/library/debian:buster
REPOSITORY TAG IMAGE ID CREATED SIZE
debian buster 00bf7fdd8baf 5 weeks ago 114MB
Check the documentation on pull and images for more information on these commands.
Pulling Python
Pull the image python:3.9-slim for Python 3.9 and then list all
python
images on your computer.Browse the official Python images to find available tags and read about image variants. What does
-slim
mean?Solution
podman pull python:3.9-slim podman images --filter=reference="python"
REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/library/python 3.9-slim e440e2151380 2 weeks ago 131 MB
python:<version>-slim
: This image does not contain the common packages contained in the default tag and only contains the minimal packages needed to run python
Key Points
Pull images with
podman pull <image-id>
List all images on the computer and other information with
podman images
Image tags distinguish releases or version and are appended to the image name with a colon
Running Containers
Overview
Teaching: 15 min
Exercises: 5 minQuestions
How are containers run?
How do you monitor containers?
How are containers exited?
How are containers restarted?
Objectives
Run containers
Understand container state
Stop and restart containers
To use an image as a particular instance on a host machine, you run it as a container. You can run in either a detached or foreground (interactive) mode.
Run the image we pulled as a container with an interactive bash terminal:
podman run -it matthewfeickert/intro-to-docker:latest /bin/bash
The -i
option here enables the interactive session, the -t
option gives access to a terminal and the /bin/bash
command makes the container start up in a bash session.
You are now inside the container in an interactive bash session. Check the file directory
pwd
/home/docker/data
and check the host to see that you are not in your local host system
hostname
<generated hostname>
Further, check the os-release
to see that you are actually inside a release of Debian
(given the Docker Library’s Python image Dockerfile choices)
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Working directory
You may be wondering why you are at
/home/docker/data
inside the container. This is the working directory that was set for the image.In the next chapters we will see how to build your own images and set parameters such as the working directory.
Monitoring Containers
Open up a new terminal tab on the host machine and list the containers that are currently running:
podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Up n minutes <generated name>
Notice that the name of your container is some randomly generated name. To make the name more helpful, rename the running container
podman rename <CONTAINER ID> my-example
and then verify it has been renamed
podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Up n minutes my-example
Renaming by name
You can also identify containers to rename by their current name
podman rename <NAME> my-example
Alternatively, you can also give the container a name at creation, using the --name
option:
podman run -it --name my-fancy-name matthewfeickert/intro-to-docker:latest /bin/bash
This way, it has a custom chosen name to start with, which you can use later on to interact with it.
Exiting and restarting containers
As a test, go back into the terminal used for your container, and create a file in the container
touch test.txt
In the container exit at the command line
exit
You are returned to your shell. If you list the containers you will notice that none are running
podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
but you can see all containers that have been run and not removed with
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Exited (0) t seconds ago my-example
To restart your exited container start it again and then attach it interactively to your shell
podman start <CONTAINER ID>
podman attach <CONTAINER ID>
exec
commandThe attach command used here is a handy shortcut to interactively access a running container with the same start command (in this case
/bin/bash
) that it was originally run with.In case you’d like some more flexibility, the exec command lets you run any command in the container, with options similar to the run command to enable an interactive (
-i
) session, etc.For example, the
exec
equivalent toattach
ing in our case would look like:podman start <CONTAINER ID> podman exec -it <CONTAINER ID> /bin/bash
Starting and attaching by name
You can also start and attach containers by their name
podman start <NAME> podman attach <NAME>
Notice that your entry point is still /home/docker/data
and then check that your
test.txt
still exists
ls
test.txt
So this shows us that we can exit containers for arbitrary lengths of time and then return to our working environment inside of them as desired.
Clean up a container
If you want a container to be cleaned up — that is, deleted — after you exit it then run with the
--rm
option flagpodman run --rm -it <IMAGE> /bin/bash
Key Points
Run containers with
podman run <image-id>
Monitor containers with
podman ps
Exit interactive sessions using the
exit
commandRestart stopped containers with
podman start
File I/O with Containers
Overview
Teaching: 15 min
Exercises: 5 minQuestions
How do containers interact with my local file system?
Objectives
Copy files to and from the container
Mount directories to be accessed and manipulated by the container
Copying
Copying files between the local host and containers is possible. On your local host, either find a file that you want to transfer to the container or create a new one. Below is the procedure for creating a new file called io_example.txt and then copying it to the container:
touch io_example.txt
echo "This was written on local host" > io_example.txt
podman cp io_example.txt <NAME>:/home/docker/data/
and then from the container check and modify it in some way
pwd
ls
cat io_example.txt
echo "This was written inside the container" >> io_example.txt
Permission issues
If you run into a
Permission denied
error, there is a simple and quick fix to continue with the exercise:exit # exit container chmod a+w io_example.txt # add write permissions for all users
And continue from the
podman cp ...
command above.
/home/docker/data
io_example.txt
This was written on local host
and then on the local host copy the file out of the container
podman cp <NAME>:/home/docker/data/io_example.txt .
and verify if the file has been modified as you wanted
cat io_example.txt
This was written on local host
This was written inside the container
Volume mounting
What is more common and arguably more useful is to mount volumes to
containers with the -v
flag.
This allows for direct access to the host file system inside the container and for
container processes to write directly to the host file system.
podman run -v <path on host>:<path in container> <image>
For example, to mount your current working directory ($PWD
) on your local machine to the data
directory in the example container
podman run --rm -it -v $PWD:/home/docker/data matthewfeickert/intro-to-docker
No such file or directory?
On Windows and macOS, you may face an error while mounting the volume:
Error: statfs <directory>: no such file or directory
.The error occurs because the directory you are trying to mount was not shared with the virtual machine that runs the containers. In latest versions of Podman and Docker your home directory is shared by default, but with Podman you can restart the machine to ensure that the directory is mounted:
podman machine stop podman machine start
Starting machine "podman-machine-default" Waiting for VM ... Mounting volume... /Users:/Users ... Machine "podman-machine-default" started successfully
From inside the container you can ls
to see the contents of your directory on your local
machine
ls
and yet you are still inside the container
pwd
/home/docker/data
You can also see that any files created in this path in the container persist upon exit
touch created_inside.txt
exit
ls *.txt
Permission issues
If you are using Linux with SELinux enabled, you might run into a
Permission denied
error. Note that SELinux is enabled if the output of the commandgetenforce status
isEnforcing
. To fix the permission issue, append:z
(lowercase!) at the end of the mount option, like this:podman run --rm -it -v $PWD:/home/docker/data:z ...
If this still does not fix the issue you can disable SELinux by running
sudo setenforce 0
, or you can try usingsudo
to execute docker/podman commands, but neither of these methods is recommended.
created_inside.txt
This I/O allows for container images to be used for specific tasks that may be difficult to do with the tools or software installed on only the local host machine. For example, debugging problems with software that arise on cross-platform software, or even just having a specific version of software perform a task (e.g., using Python 2 when you don’t want it on your machine, or using a specific release of TeX Live when you aren’t ready to update your system release).
Key Points
Copy files with
podman cp
Mount volumes with
podman run -v <path on host>:<path in container> <image>
Coffee break
Overview
Teaching: 0 min
Exercises: 15 minQuestions
Coffee or tea?
Objectives
Refresh your mental faculties with coffee and conversation
Key Points
Breaks are helpful in the service of learning
Writing Dockerfiles and Building Images
Overview
Teaching: 30 min
Exercises: 10 minQuestions
How are Dockerfiles written?
How are images built?
Objectives
Write simple Dockerfiles
Build a container image from a Dockerfile
Container images are static files that contain a template to create containers on machines. Container engines like Podman or Docker pull the images from repositories or local storage and then create containers from them. Container engines can also build and save to a repository new container images, interactively or following a set of instructions, starting from scratch or modifying an existing image.
A common way of defining the instructions to build a container image is through a Dockerfile.
These text based documents provide the instructions through an API similar to the Linux
operating system commands to execute commands during the build.
The Dockerfile
for the example image being used is an example of
some simple extensions of the official Python 3.9 Docker image based on Debian Bullseye (python:3.9-bullseye
).
Like Docker, Podman also uses Dockerfile
s to build images, so the same instructions can be used for both tools.
We will continue with Podman throughout this lesson but the same commands can be used with Docker.
As a very simple example of extending the example image into a new image create a Dockerfile
on your local machine
touch Dockerfile
and then write in it the Docker engine instructions to add cowsay
and
scikit-learn
to the environment
# Dockerfile
# Specify the base image that we're building the image on top of
FROM matthewfeickert/intro-to-docker:latest
# Build the image as root user
USER root
# Run some bash commands to install packages
RUN apt-get -y update && \
apt-get -y upgrade && \
apt-get -y install cowsay && \
apt-get -y autoclean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt-get/lists/* && \
ln -s /usr/games/cowsay /usr/bin/cowsay
RUN pip install --no-cache-dir -q scikit-learn
# This sets the default working directory when a container is launched from the image
WORKDIR /home/docker
# Run as docker user by default when the container starts up
USER docker
Dockerfile layers (or: why all these ‘&&’s??)
Each
RUN
command in a Dockerfile creates a new layer to the image. In general, each layer should try to do one job and the fewer layers in an image the easier it is compress.This is why you see all these ‘&& 's in the
RUN
command, so that all the shell commands will run in a pipeline and will take place in a single layer When trying to upload and download images on demand the smaller the size the better.Another thing to keep in mind is that each
RUN
command occurs in its own shell, so any environment variables, etc. set in oneRUN
command will not persist to the next.
Garbage cleanup
Notice that the last few lines of the
RUN
command clean up and remove unneeded files that get produced during the installation process. This is important for keeping image sizes small, since files produced during each image-building layer will persist into the final image and add unnecessary bulk.
Don’t run as
root
By default Docker containers will run as
root
. This is a bad idea and a security concern. Instead, setup a default user (likedocker
in the example) and if needed give the user greater privileges.
Then build
an image from the Dockerfile
with Podman and tag it with a
human-readable name
podman build -f Dockerfile -t extend-example:latest .
You can now run the image as a container and verify for yourself that your additions exist
podman run --rm -it extend-example:latest /bin/bash
which cowsay
cowsay "Hello from inside the container"
pip list | grep scikit
python3 -c "import sklearn as sk; print(sk)"
/usr/bin/cowsay
___________________
< Hello from inside the container >
-------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
scikit-learn 1.3.1
<module 'sklearn' from '/usr/local/lib/python3.9/site-packages/sklearn/__init__.py'>
You can list all images available on your local machine with podman images
:
podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/extend-example latest c24a757fabe7 8 hours ago 2.2 GB
docker.io/matthewfeickert/intro-to-docker latest 64708e04f3a9 2 years ago 1.62 GB
...
docker.io
indicates that the image was pulled from the Docker Hub,
while localhost
indicates that the image was built locally.
Tags
In the examples so far the built image has been tagged with a single tag (e.g. latest
).
However, tags are simply arbitrary labels meant to help identify images and images can
have multiple tags.
New tags can be specified in the podman build
(or docker build
) command by giving the -t
flag multiple
times or they can be specified after an image is built by using
podman tag
.
podman tag <SOURCE_IMAGE[:TAG]> <TARGET_IMAGE[:TAG]>
Add your own tag
Using
podman tag
add a new tag to the image you built.Solution
podman images extend-example podman tag extend-example:latest extend-example:my-tag podman images extend-example
REPOSITORY TAG IMAGE ID CREATED SIZE localhost/extend-example latest c24a757fabe7 9 hours ago 2.2 GB REPOSITORY TAG IMAGE ID CREATED SIZE localhost/extend-example my-tag c24a757fabe7 9 hours ago 2.2 GB localhost/extend-example latest c24a757fabe7 9 hours ago 2.2 GB
Tags are labels
Note how the image ID didn’t change for the two tags: they are the same object. Tags are simply convenient human-readable labels.
COPY
Podman also gives you the ability to copy external files into a container image during the
build with the COPY
Dockerfile command.
Which allows copying a target file from a host file system into the image
file system
COPY <path on host> <path in container image>
For example, if there is a file called install_python_deps.sh
in the same directory as
the build is executed from
touch install_python_deps.sh
with contents
cat install_python_deps.sh
#!/usr/bin/env bash
set -e
pip install --upgrade --no-cache-dir pip setuptools wheel
pip install --no-cache-dir -q scikit-learn
then this could be copied into the container image of the previous example during the build and then used (and then removed as it is no longer needed).
Create a new file called Dockerfile.copy
:
touch Dockerfile.copy
and fill it with a modified version of the above Dockerfile, where we now copy install_python_deps.sh
from the local working directory into the container and use it to install the specified python dependencies:
# Dockerfile.copy
FROM matthewfeickert/intro-to-docker:latest
USER root
RUN apt-get -qq -y update && \
apt-get -qq -y upgrade && \
apt-get -qq -y install cowsay && \
apt-get -y autoclean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt-get/lists/* && \
ln -s /usr/games/cowsay /usr/bin/cowsay
COPY install_python_deps.sh install_python_deps.sh
RUN bash install_python_deps.sh && \
rm install_python_deps.sh
WORKDIR /home/data
USER docker
podman build -f Dockerfile.copy -t copy-example:latest .
For very complex scripts or files that are on some remote, COPY
offers a straightforward
way to bring them into the container image build.
Key Points
Dockerfiles are written as text file commands to the container engine
Images are built with
podman build
Images can have multiple tags associated to them
Images can use
COPY
to copy files into them during build
Removal of Containers and Images
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How do you cleanup old containers?
How do you delete images?
Objectives
Learn how to cleanup after working with containers
You can cleanup/remove a container with podman rm
podman rm <CONTAINER NAME>
Remove old containers
Start an instance of the tutorial container, exit it, and then remove it with
podman rm
Solution
podman run matthewfeickert/intro-to-docker:latest podman ps -a podman rm <CONTAINER NAME> podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES <generated id> <image:tag> "/bin/bash" n seconds ago Exited (0) t seconds ago <name> <generated id> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
You can remove an image from your computer entirely with podman rmi
podman rmi <IMAGE ID>
Remove an image
Pull down the Python 2.7 image (2.7-slim tag) from Docker Hub and then delete it.
Solution
podman pull python:2.7-slim podman images python podman rmi <IMAGE ID> podman images python
2.7: Pulling from library/python <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete Digest: sha256:<the relevant SHA hash> Status: Downloaded newer image for python:2.7-slim docker.io/library/python:2.7-slim REPOSITORY TAG IMAGE ID CREATED SIZE python 2.7-slim d75b4eed9ada 14 hours ago 886MB python 3.9-slim e440e2151380 23 hours ago 918MB Untagged: python@sha256:<the relevant SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> REPOSITORY TAG IMAGE ID CREATED SIZE python 3.9-slim e440e2151380 23 hours ago 918MB
Helpful cleanup commands
What is helpful is to have a command to detect and remove unwanted images and containers for you. This can be done with
prune
, which depending on the context will remove different things.
podman container prune
removes all stopped containers, which is helpful to clean up forgotten stopped containers.podman image prune
removes all unused or dangling images (images that do not have a tag). This is helpful for cleaning up after builds.podman system prune
removes all stopped containers, dangling images, and dangling build caches. This is very helpful for cleaning up everything all at once.
Key Points
Remove containers with
podman rm <CONTAINER NAME>
Remove images with
podman rmi <IMAGE ID>
Perform faster cleanup with
podman container prune
,podman image prune
, andpodman system prune
Using CMD and ENTRYPOINT in Dockerfiles
Overview
Teaching: 15 min
Exercises: 10 minQuestions
How are default commands set in Dockerfiles?
Objectives
Learn how and when to use
CMD
Learn how and when to use
ENTRYPOINT
So far every time we’ve run the containers we’ve typed
podman run --rm -it <IMAGE>:<TAG> <command>
like
podman run --rm -it python:3.9-slim /bin/bash
Running this dumps us into a Bash session
echo $SHELL
SHELL=/bin/bash
However, if no /bin/bash
is given then you are placed inside the Python 3.9 REPL.
podman run --rm -it python:3.9-slim
Python 3.9.18 (main, Feb 13 2024, 10:56:47)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
These are very different behaviors, so let’s understand what is happening.
The Python 3.9 image has a default command that runs when the container is executed,
which is specified in the Dockerfile with CMD
.
Create a file named Dockerfile.defaults
touch Dockerfile.defaults
# Dockerfile.defaults
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-slim
FROM ${BASE_IMAGE}
USER root
RUN apt-get -qq -y update && \
apt-get -qq -y upgrade && \
apt-get -y autoclean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt-get/lists/*
# Create user "docker"
RUN useradd -m docker && \
cp /root/.bashrc /home/docker/ && \
mkdir /home/docker/data && \
chown -R --from=root docker /home/docker
ENV HOME /home/docker
WORKDIR ${HOME}/data
USER docker
CMD ["/bin/bash"]
Now build the dockerfile, specifying its name with the -f
argument since the engine will otherwise look for a file named Dockerfile
by default.
podman build -f Dockerfile.defaults -t defaults-example:latest .
Now running
podman run --rm -it defaults-example:latest
again drops you into a Bash shell as specified by CMD
.
As has already been seen, CMD
can be overridden by giving a command after the image
podman run --rm -it defaults-example:latest python3
The ENTRYPOINT
builder command allows to define a command or
commands that are always run at the “entry” to the container.
If an ENTRYPOINT
has been defined then CMD
provides optional inputs to the ENTRYPOINT
.
Create a file named entrypoint.sh
# entrypoint.sh
#!/usr/bin/env bash
set -e
function main() {
if [[ $# -eq 0 ]]; then
printf "\nHello, World!\n"
else
printf "\nHello %s\n" "${1}"
fi
}
main "$@"
/bin/bash
And now modify the Dockerfile.defaults
to use the entrypoint.sh
script
# Dockerfile.defaults
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-slim
FROM ${BASE_IMAGE}
USER root
RUN apt-get -qq -y update && \
apt-get -qq -y upgrade && \
apt-get -y autoclean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt-get/lists/*
# Create user "docker"
RUN useradd -m docker && \
cp /root/.bashrc /home/docker/ && \
mkdir /home/docker/data && \
chown -R --from=root docker /home/docker
ENV HOME /home/docker
WORKDIR ${HOME}/data
USER docker
COPY entrypoint.sh $HOME/entrypoint.sh
ENTRYPOINT ["/bin/bash", "/home/docker/entrypoint.sh"]
CMD ["there"]
Note how CMD
provides an optional input to entrypoint.sh
.
podman build -f Dockerfile.defaults -t defaults-example:latest --compress .
So now try
podman run --rm -it defaults-example:latest
Applied
ENTRYPOINT
andCMD
What will be the output of
podman run --rm -it defaults-example:latest $USER
and why?
Solution
Hello <your user name> docker@2a99ffabb512:~/data$
$USER
is evaluated and then overrides the defaultCMD
to be passed toentrypoint.sh
All about
ENTRYPOINT
andCMD
ENTRYPOINT and CMD can be both in “exec” or “shell” form, although we recommend to use exec form. Exec form must be an array of comma separated quoted arguments and it us executed via the Linux
execv()
. E.g.CMD ["/usr/bin/ls", "-al"]
Anything else, also if you forget just the quotes, will be considered shell form, it is passed by Docker/Podman to/bin/sh -c
(as written, with quotes, parentheses, …), and can use shell features like PATH and expansion. E.g.CMD ls -al
At execution, ENTRYPOINT can be overridden with the
--entrypoint
option, CMD with any argument of the invocation. When ENTRYPOINT is in exec form, CMD or the invocatipon arguments are passed as additional arguments (as single string, with additional “/bin/sh” “-c” arguments if CMD is in shell form). When ENTRYPOINT is in shell form, CMD and invocation arguments are ignored.An interactive session,
run -it
, is possible only if the last command (ENTRYPOINT if present, arguments or CMD) is interactive, i.e. not terminating.
The use case seen above is common for application containers: ENTRYPOINT (in exec form) is used for the command and CMD for is ued the default arguments that can be easily overridden at invocation.
Another common use case is to
run an initialization script before anything else in the container, e.g. to download files or set variables only available at run-time, or to
get secrets from a key-store.
For that you can use an entrypoint.sh
like:
#!/bin/sh
echo "You are running on $(hostname)"
# download tokens and recrets
export MY_TOKEN=./token_file.jwt
bash -c "$*"
The last line is the key to treat the arguments in CMD or the command line as commands.
Remember to set entrypoint.sh
as executable and to use the exec form for ENTRYPOINT (ENTRYPOINT ["./entrypoint.sh"]
)
Note that if the file to download or value of the variable are known when building the image, you can use the RUN command in the Dockerfile
instead, which is more efficient than the entrypoint script.
Key Points
CMD
provide defaults for an executing container
CMD
can provide options forENTRYPOINT
ENTRYPOINT
allows you to configure commands that will always run for an executing container
Bonus Episode: Building and deploying a Docker container to Github Packages
Overview
Teaching: 40 min
Exercises: 0 minQuestions
How to build a Docker container for python packages?
How to share Docker images?
Objectives
To be able to build a Docker container and share it via GitHub packages
Prerequisites
For this lesson, you will need,
- Knowledge of Git SW Carpentry Git-Novice Lesson
- Knowledge of GitHub CI/CD HSF Github CI/CD Lesson
Docker Container for python packages
Python packages can be installed using a Docker image. The following example illustrates how to write a Dockerfile for building an image containing python packages.
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
&& apt-get install wget -y \
&& apt-get install dpkg-dev cmake g++ gcc binutils libx11-dev libxpm-dev \
libxft-dev libxext-dev python3 libssl-dev libgsl0-dev libtiff-dev \
python3-pip -y
RUN pip3 install numpy \
&& pip3 install awkward \
&& pip3 install uproot4 \
&& pip3 install particle \
&& pip3 install hepunits \
&& pip3 install matplotlib \
&& pip3 install mplhep \
&& pip3 install vector \
&& pip3 install fastjet \
&& pip3 install iminuit
As we see, several packages are installed.
Publish Docker images with GitHub Packages and share them!
It is possible to publish Docker images with GitHub packages. To do so, one needs to use GitHub CI/CD. A step-by-step guide is presented here.
- Step 1: Create a GitHub repository and clone it locally.
- Step 2: In the empty repository, make a folder called
.github/workflows
. In this folder we will store the file containing the YAML script for a GitHub workflow, namedDocker-build-deploy.yml
(the name doesn’t really matter). - Step 3: In the top directory of your GitHub repository, create a file named
Dockerfile
. - Step 4: Copy-paste the content above and add it to the Dockerfile. (In principle it is possible to build this image locally, but we will not do that here, as we wish to build it with GitHub CI/CD).
- Step 5: In the
Docker-build-deploy.yml
file, add the following content:
name: Create and publish a Docker image
on:
push:
branches:
- main
- master
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Docker Metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
The above script is designed to build and publish a Docker image with GitHub packages.
- Step 6: Add LICENSE and README as recommended in the SW Carpentry Git-Novice Lesson, and then the repository is good to go.
Key Points
Python packages can be installed in Docker images along with ubuntu packages.
It is possible to publish and share Docker images over github packages.