Recently I’ve decided to build a RSS reader and during development I thought it would be nice to make it run on my home freedombox server. What could possibly go wrong?
It turned out to be quite a non-trivial exercise. Partly because a 2 CPU, <1GB RAM server isn’t that capable and partly due to making use of several technologies that take an epic amount of processing power, disk space, RAM or a combination of there-of.
At first I tried simply to build the Docker images on the actual server but that took ages so
I looked around what other options are there. If you ask yourself why I couldn’t simply build
images on my local machine that is because I have an
amd64 running and my home server’s arch is
Also note that many projects do not build ARM images at all so depending on your appetite everything has to be
cross-built for target architecture with docker buildx.
Happy that there was a seemingly easy solution here is how I did it:
docker buildx create --name free-builder \ --platform linux/arm/v7 \ --driver-opt network=host docker buildx inspect free-builder --bootstrap docker buildx use free-builder docker run -it --rm --privileged tonistiigi/binfmt --install all docker buildx build -t crnkofe/tagger . --platform linux/arm/v7 --load
docker buildx create / inspect to create a special docker image for building other images.
docker run -it --rm --privileged tonistiigi/binfmt --install all is needed to install
emulators for ARM architecture.
docker buildx then builds the actual image and
--load tells it to make it available in ordinary
docker images which in turn allows you to push it to Docker Hub or any other image repository.
Dockerfile looks as follows:
FROM python:3-buster WORKDIR /opt/ RUN apt-get update && apt-get install -y python-pip COPY requirements.txt /opt/requirements.txt RUN pip3 install -r requirements.txt RUN python -c "import nltk; nltk.download('all')"
# ... matplotlib==3.3.4 nltk==3.5 numpy==1.20.1 #...
Those who had already done installs of any of typical ML-related libraries for
Python will observe installation of matplotlib, NLTK involves compiling quite a few C libraries. This coupled with
the fact that
docker buildx works by emulating target architecture makes image building take hours.
Last but not least while NLTK download works quite fast locally in development, it’s around 1GB of data
which significantly inflates Docker image size.
Looking at all this and pondering my life’s choices I decide not to just abandon the home server.
Let’s try splitting this up, possibly optimize some things away.
I decided to make a base image out of
matplotlib,nltk,numpy to cut down on
any potential rebuild time and push it to Docker Hub
Here is the new and improved Dockerfile for base image:
FROM python:3.9-slim-buster WORKDIR /opt/ RUN apt-get update && apt-get install -y python3-pip RUN pip3 install --no-cache-dir -v numpy==1.20.1 RUN pip3 install --no-cache-dir -v nltk==3.5 RUN apt-get -y install zlib1g-dev \ libffi-dev \ libfreetype6-dev \ libfribidi-dev \ libharfbuzz-dev \ libjpeg-turbo-progs \ libjpeg62-turbo-dev \ liblcms2-dev \ libopenjp2-7-dev \ libtiff5-dev \ libwebp-dev RUN pip3 install --no-cache-dir --prefer-binary -v matplotlib==3.3.4
First I no longer install requirements.txt in one go which sometimes even timed out.
Requirements are installed one-by-one to make any future additions less painful to build.
slim-buster base image is used instead of
buster to make image size smaller.
Ideally to make images as small as possible one should use Alpine but I like some defaults of Debian’s images.
pip not cache wheels or any intermediate data which in turn makes the image noticeably smaller.
To build the image run:
docker buildx create --name free-builder \ --platform linux/arm/v7 \ --driver-opt network=host docker buildx inspect free-builder --bootstrap docker buildx use free-builder docker run -it --rm --privileged tonistiigi/binfmt --install all docker buildx build -t crnkofe/nltk-base . --platform linux/arm/v7 --load docker push crnkofe/nltk-base
Feel free to check the result on my Docker Hub.
What remains is the derived app image which I picked down to:
FROM crnkofe/nltk-base WORKDIR /opt/ COPY requirements.txt /opt/requirements.txt RUN pip3 install -v -r requirements.txt RUN python -c "import nltk; nltk.download('stopwords')" RUN python -c "import nltk; nltk.download('brown')" RUN python -c "import nltk; nltk.download('reuters')" RUN python -c "import nltk; nltk.download('movie_reviews')"
Note the base
FROM crnkofe/nltk-base which I now have prebuilt. I also do
nltk.download per NLTK dataset which
means I don’t need to rebuild everything in case I decide to add some stuff. I might eventually move NLTK out
of Docker but for the moment it’s convenient and doesn’t take that long.
Build time went from what was initially several hours to about 1H for base image (which I no longer build except on rainy days) and <10 minutes(from scratch) for the application image which I frequently change and is perfectly acceptable for home development.
My RSS reader isn’t yet on GitHub. I still want to polish it a bit before throwing it in the wild. It’s coming together rather nicely (written in Golang and Vue3) but still has a bunch of rough edges that need ironing out.