Recently I’ve decided to build a RSS reader and during development I thought it would be nice to make it run on my home freedombox server. What could possibly go wrong?

It turned out to be quite a non-trivial exercise. Partly because a 2 CPU, <1GB RAM server isn’t that capable and partly due to making use of several technologies that take an epic amount of processing power, disk space, RAM or a combination of there-of.

At first I tried simply to build the Docker images on the actual server but that took ages so I looked around what other options are there. If you ask yourself why I couldn’t simply build images on my local machine that is because I have an amd64 running and my home server’s arch is armv7l compatible. Also note that many projects do not build ARM images at all so depending on your appetite everything has to be cross-built for target architecture with docker buildx.

Happy that there was a seemingly easy solution here is how I did it:

docker buildx create  --name free-builder \
                      --platform linux/arm/v7 \
                      --driver-opt network=host

docker buildx inspect free-builder --bootstrap

docker buildx use free-builder
docker run -it --rm --privileged tonistiigi/binfmt --install all 

docker buildx build -t crnkofe/tagger . --platform linux/arm/v7 --load

I used docker buildx create / inspect to create a special docker image for building other images. docker run -it --rm --privileged tonistiigi/binfmt --install all is needed to install emulators for ARM architecture.

docker buildx then builds the actual image and --load tells it to make it available in ordinary docker images which in turn allows you to push it to Docker Hub or any other image repository.

Dockerfile looks as follows:

FROM python:3-buster
WORKDIR /opt/

RUN apt-get update && apt-get install -y python-pip
COPY requirements.txt /opt/requirements.txt

RUN pip3 install -r requirements.txt
RUN python -c "import nltk; nltk.download('all')"

and requirements.txt:

# ...
matplotlib==3.3.4
nltk==3.5
numpy==1.20.1
#...

Those who had already done installs of any of typical ML-related libraries for Python will observe installation of matplotlib, NLTK involves compiling quite a few C libraries. This coupled with the fact that docker buildx works by emulating target architecture makes image building take hours. Last but not least while NLTK download works quite fast locally in development, it’s around 1GB of data which significantly inflates Docker image size.

Looking at all this and pondering my life’s choices I decide not to just abandon the home server. Let’s try splitting this up, possibly optimize some things away. I decided to make a base image out of matplotlib,nltk,numpy to cut down on any potential rebuild time and push it to Docker Hub

Here is the new and improved Dockerfile for base image:

FROM python:3.9-slim-buster

WORKDIR /opt/

RUN apt-get update && apt-get install -y python3-pip

RUN pip3 install --no-cache-dir -v numpy==1.20.1
RUN pip3 install --no-cache-dir -v nltk==3.5

RUN apt-get -y install zlib1g-dev \
    libffi-dev \
    libfreetype6-dev \
    libfribidi-dev \
    libharfbuzz-dev \
    libjpeg-turbo-progs \
    libjpeg62-turbo-dev \
    liblcms2-dev \
    libopenjp2-7-dev \
    libtiff5-dev \
    libwebp-dev

RUN pip3 install --no-cache-dir --prefer-binary -v matplotlib==3.3.4

First I no longer install requirements.txt in one go which sometimes even timed out. Requirements are installed one-by-one to make any future additions less painful to build. slim-buster base image is used instead of buster to make image size smaller. Ideally to make images as small as possible one should use Alpine but I like some defaults of Debian’s images. --no-cache-dir makes pip not cache wheels or any intermediate data which in turn makes the image noticeably smaller.

To build the image run:

docker buildx create  --name free-builder \
                      --platform linux/arm/v7 \
                      --driver-opt network=host
docker buildx inspect free-builder --bootstrap

docker buildx use free-builder
docker run -it --rm --privileged tonistiigi/binfmt --install all

docker buildx build -t crnkofe/nltk-base . --platform linux/arm/v7 --load
docker push crnkofe/nltk-base

Feel free to check the result on my Docker Hub.

What remains is the derived app image which I picked down to:

FROM crnkofe/nltk-base

WORKDIR /opt/

COPY requirements.txt /opt/requirements.txt
RUN pip3 install -v -r requirements.txt

RUN python -c "import nltk; nltk.download('stopwords')"
RUN python -c "import nltk; nltk.download('brown')"
RUN python -c "import nltk; nltk.download('reuters')"
RUN python -c "import nltk; nltk.download('movie_reviews')"

Note the base FROM crnkofe/nltk-base which I now have prebuilt. I also do nltk.download per NLTK dataset which means I don’t need to rebuild everything in case I decide to add some stuff. I might eventually move NLTK out of Docker but for the moment it’s convenient and doesn’t take that long.

Build time went from what was initially several hours to about 1H for base image (which I no longer build except on rainy days) and <10 minutes(from scratch) for the application image which I frequently change and is perfectly acceptable for home development.

My RSS reader isn’t yet on GitHub. I still want to polish it a bit before throwing it in the wild. It’s coming together rather nicely (written in Golang and Vue3) but still has a bunch of rough edges that need ironing out.