Creating super small docker images

  |   Source

Docker containers are lighter than virtual machines but in many cases images are way bigger than they ought to be. For example the official docker python image is approximately 900MB in size and this is only for the python runtime with no external libraries installed.

Python itself is not small, a typical python installation needs close to 100 MB once uncompressed on the disk. Of course one could imagine that here are many files included that aren't needed in most of the usual cases (like the turtle module). Is it possible to create a smaller python docker image?

The answer is YES; if you now do a docker pull elyase/staticpython you will get a working python image with only 8.5 MB in size.

Looking for the smallest base image

Docker images are built from a base image. If we want the get a final small container this should be the first optimization target. Lets take a look at some popular base images:

Linux Tag Size(MB)
fedora heisenbug 372
ubuntu 14.04 195
debian wheezy 85
busybox latest 2.5

Busybox is the clear winner but the small size comes with its own tradeoffs. Busybox is a very minimal distro, it doesn't have a package manager or a gcc compiler, so compilation from scratch of new packages is also difficult.

Fortunately Jeff Lindsay has created a busybox based image with the opkg package installer pre-configured with OpenWRT repositories. You don't really get recent versions or a great variety of packages, in fact you don't even get normal GNU versions of the UNIX utilities which brings some difficulties. For example wget doesn't work with https sites and curl has outdated certificates. Also uncompressing files is different as bzip and tar have some common command line options missing. But with some trial and error it can be enough to create a working python environment.

Looking for the smallest python

The first one I found was eGenix PyRun: "the one file Python runtime environment". The executable only needs 11MB for Python 2 and 13MB for Python 3. They achieve this impressive size by "freezing" the whole standard library. Also some less used modules like tkinter have been stripped out. In general everything should run like in a standard python distribution.

But finally I went for StaticPython which is even smaller at approximately 6MB. StaticPython is similar to PyRun, except that StaticPython doesn't let you compile C extensions. Also and PyRun needs several libraries (e.g. glibc, OpenSSL, zlib, SQLite, bzip2) installed on the host, StaticPython on the other side runs even if it's the only file on the filesystem. That's great packaging! You can see the resulting image at the Docker Registry. For example:

$ docker run -t -i --rm elyase/staticpython python
Python 2.7.1 (r271:86832, Oct 30 2011, 11:44:49)[GCC 4.1.2] on linux2
Type "help", "copyright", "credits" or "license"for more information.
>>>

For a new image create a Dockerfile in your Python app project:

FROM elyase/staticpython
CMD [ "python", "./your-daemon-or-script.py" ]

Pyrun Dockerfile

PyRun is not 100% self contained and requires some dependencies on the target system:

  • OpenSSL 1.0.0 or later
  • zlib 1.2 or later
  • SQLite 3.4 or later
  • bzip2 1.0 or later

This is the final Dockerfile:

FROM progrium/busybox
MAINTAINER Yaser Martinez Palenzuela <@elyase>
RUN opkg-install bzip2 libsqlite3 libpthread zlib libopenssl
ADD pyrun2.7 /bin/python
RUN ln -s /usr/lib/libbz2.so.1.0 /usr/lib/libbz2.so.1
RUN ln -sf /lib/libpthread-2.18.so /lib/libpthread.so.0

I also created a similar version for the 3.4 interpreter that can pulled with:

$ docker pull elyase/pyrun:3.4

What about pip, numpy, etc.

While pyrun is compatible with setuptools, easy_install, pip, I decided to only pack the python executable in order to achieve the smallest possible size but I might make a full pip compatible image in the future. For more fully featured installations needing additional dependencies I created an additional image using the Anaconda Python Distribution.

Anaconda + Busybox

This one was somewhat more complex to get working because I had to adapt the full miniconda installer to work in the constrained busybox environment. The Dockerfile is quite simple:

FROM progrium/busybox
MAINTAINER Yaser Martinez Palenzuela
RUN opkg-install bash bzip2
ADD conda_install.sh /root/conda_install.sh
RUN ["bash", "/root/conda_install.sh"]
ENV PATH /root/miniconda3/bin:$PATH

Inside the conda_install script is where the magic happens. Essentially it downloads and extracts the installer, sets corresponding paths, installs and finally makes a comprehensive cleaning. I removed conda package files under pkgs/, several tests folders (unittest is still there) and deleted all .pyc files (__pycache__ folder in 3.4). These should be automatically regenerated on first run but depending on your use case this could affect performance.

The total weight of this image is 88.97MB. In the end you get a fully featured python installation plus bash and the awesome conda package manager. From here you are only a conda install away from a complete scientific environment.

Comments powered by Disqus