Building Docker images for ARM
Recently I’ve decided to build a RSS reader and during development I thought it would be nice to make it run on my home freedombox server. What could possibly go wrong?
It turned out to be quite a non-trivial exercise. Partly because a 2 CPU, <1GB RAM server isn’t that capable and partly due to making use of several technologies that take an epic amount of processing power, disk space, RAM or a combination of there-of.
At first I tried simply to build the Docker images on the actual server but that took ages so
I looked around what other options are there. If you ask yourself why I couldn’t simply build
images on my local machine that is because I have an amd64
running and my home server’s arch is armv7l
compatible.
Also note that many projects do not build ARM images at all so depending on your appetite everything has to be
cross-built for target architecture with docker buildx.
Happy that there was a seemingly easy solution here is how I did it:
docker buildx create --name free-builder \
--platform linux/arm/v7 \
--driver-opt network=host
docker buildx inspect free-builder --bootstrap
docker buildx use free-builder
docker run -it --rm --privileged tonistiigi/binfmt --install all
docker buildx build -t crnkofe/tagger . --platform linux/arm/v7 --load
I used docker buildx create / inspect
to create a special docker image for building other images.
docker run -it --rm --privileged tonistiigi/binfmt --install all
is needed to install
emulators for ARM architecture.
docker buildx
then builds the actual image and --load
tells it to make it available in ordinary
docker
images which in turn allows you to push it to Docker Hub or any other image repository.
Dockerfile looks as follows:
FROM python:3-buster
WORKDIR /opt/
RUN apt-get update && apt-get install -y python-pip
COPY requirements.txt /opt/requirements.txt
RUN pip3 install -r requirements.txt
RUN python -c "import nltk; nltk.download('all')"
and requirements.txt:
# ...
matplotlib==3.3.4
nltk==3.5
numpy==1.20.1
#...
Those who had already done installs of any of typical ML-related libraries for
Python will observe installation of matplotlib, NLTK involves compiling quite a few C libraries. This coupled with
the fact that docker buildx
works by emulating target architecture makes image building take hours.
Last but not least while NLTK download works quite fast locally in development, it’s around 1GB of data
which significantly inflates Docker image size.
Looking at all this and pondering my life’s choices I decide not to just abandon the home server.
Let’s try splitting this up, possibly optimize some things away.
I decided to make a base image out of matplotlib,nltk,numpy
to cut down on
any potential rebuild time and push it to Docker Hub
Here is the new and improved Dockerfile for base image:
FROM python:3.9-slim-buster
WORKDIR /opt/
RUN apt-get update && apt-get install -y python3-pip
RUN pip3 install --no-cache-dir -v numpy==1.20.1
RUN pip3 install --no-cache-dir -v nltk==3.5
RUN apt-get -y install zlib1g-dev \
libffi-dev \
libfreetype6-dev \
libfribidi-dev \
libharfbuzz-dev \
libjpeg-turbo-progs \
libjpeg62-turbo-dev \
liblcms2-dev \
libopenjp2-7-dev \
libtiff5-dev \
libwebp-dev
RUN pip3 install --no-cache-dir --prefer-binary -v matplotlib==3.3.4
First I no longer install requirements.txt in one go which sometimes even timed out.
Requirements are installed one-by-one to make any future additions less painful to build.
slim-buster
base image is used instead of buster
to make image size smaller.
Ideally to make images as small as possible one should use Alpine but I like some defaults of Debian’s images.
--no-cache-dir
makes pip
not cache wheels or any intermediate data which in turn makes the image noticeably smaller.
To build the image run:
docker buildx create --name free-builder \
--platform linux/arm/v7 \
--driver-opt network=host
docker buildx inspect free-builder --bootstrap
docker buildx use free-builder
docker run -it --rm --privileged tonistiigi/binfmt --install all
docker buildx build -t crnkofe/nltk-base . --platform linux/arm/v7 --load
docker push crnkofe/nltk-base
Feel free to check the result on my Docker Hub.
What remains is the derived app image which I picked down to:
FROM crnkofe/nltk-base
WORKDIR /opt/
COPY requirements.txt /opt/requirements.txt
RUN pip3 install -v -r requirements.txt
RUN python -c "import nltk; nltk.download('stopwords')"
RUN python -c "import nltk; nltk.download('brown')"
RUN python -c "import nltk; nltk.download('reuters')"
RUN python -c "import nltk; nltk.download('movie_reviews')"
Note the base FROM crnkofe/nltk-base
which I now have prebuilt. I also do nltk.download
per NLTK dataset which
means I don’t need to rebuild everything in case I decide to add some stuff. I might eventually move NLTK out
of Docker but for the moment it’s convenient and doesn’t take that long.
Build time went from what was initially several hours to about 1H for base image (which I no longer build except on rainy days) and <10 minutes(from scratch) for the application image which I frequently change and is perfectly acceptable for home development.
My RSS reader isn’t yet on GitHub. I still want to polish it a bit before throwing it in the wild. It’s coming together rather nicely (written in Golang and Vue3) but still has a bunch of rough edges that need ironing out.