Docker Layers

Did you know that every line in your Dockerfile that starts with the all-caps statement such as ENV, COPY, or RUN creates a new layer in the docker image? This results in more layers to download and possibly larger file sizes for the images.

“Whatever, my network is fast and once I have the image I’m good to go,” I hear you say. But consider the pipelines and all the environments where the image might run for dev and testing. Each image download needs to pull all the layers. I think it could be worth a bit of effort to reduce this, especially because it is easily done.

Let’s make a useless docker image as an example which installs git then removes it after running a git command. Here is our Dockerfile:

FROM alpine

ENV PACKAGES="git"
RUN apk add ${PACKAGES}
ENV FOO="bar"
LABEL key=value
RUN git version
RUN apk del ${PACKAGES}

Build the image and review the layers of any image using the docker history command:

» docker buildx build -t layers1 .
[+] Building 0.0s (8/8) FINISHED                                                                                docker:default
 => [internal] load .dockerignore                                                                                         0.0s
 => => transferring context: 2B                                                                                           0.0s
 => [internal] load build definition from Dockerfile                                                                      0.0s
 => => transferring dockerfile: 164B                                                                                      0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                          0.0s
 => [1/4] FROM docker.io/library/alpine                                                                                   0.0s
 => CACHED [2/4] RUN apk add git                                                                                          0.0s
 => CACHED [3/4] RUN git version                                                                                          0.0s
 => CACHED [4/4] RUN apk del git                                                                                          0.0s
 => exporting to image                                                                                                    0.0s
 => => exporting layers                                                                                                   0.0s
 => => writing image sha256:028e5f38db821c7e34398216b7e9c234f717897cfb182608733307545f8111a8                              0.0s
 => => naming to docker.io/library/layers1

» docker history layers1
IMAGE          CREATED              CREATED BY                                      SIZE      COMMENT
028e5f38db82   About a minute ago   RUN /bin/sh -c apk del ${PACKAGES} # buildkit   26kB      buildkit.dockerfile.v0
<missing>      About a minute ago   RUN /bin/sh -c git version # buildkit           0B        buildkit.dockerfile.v0
<missing>      About a minute ago   LABEL key=value                                 0B        buildkit.dockerfile.v0
<missing>      About a minute ago   ENV FOO=bar                                     0B        buildkit.dockerfile.v0
<missing>      About a minute ago   RUN /bin/sh -c apk add ${PACKAGES} # buildkit   11.5MB    buildkit.dockerfile.v0
<missing>      About a minute ago   ENV PACKAGES=git                                0B        buildkit.dockerfile.v0
<missing>      10 months ago        /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>      10 months ago        /bin/sh -c #(nop) ADD file:40887ab7c06977737…   7.04MB

We see that each line in the Dockerfile is a layer in the image history. Let’s refactor the image with fewer layers. We can continue statements onto new lines by adding \ at the end of each line.

FROM alpine

ENV \
  PACKAGES="git" \
  FOO="bar"

LABEL \
  key=value \
  key2=value2

RUN apk add ${PACKAGES} && \
  git version && \
  apk del ${PACKAGES}

This is also a more readable file, better organized with variables declared at the top. Now build and check history again:

» docker buildx build -t layers1 .
[+] Building 3.8s (6/6) FINISHED                                                                                docker:default
 => [internal] load build definition from Dockerfile                                                                      0.0s
 => => transferring dockerfile: 198B                                                                                      0.0s
 => [internal] load .dockerignore                                                                                         0.0s
 => => transferring context: 2B                                                                                           0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                          0.0s
 => CACHED [1/2] FROM docker.io/library/alpine                                                                            0.0s
 => [2/2] RUN apk add git &&   git version &&   apk del git                                                               3.0s
 => exporting to image                                                                                                    0.7s
 => => exporting layers                                                                                                   0.7s
 => => writing image sha256:0e7e71fe36c427d19e7e0a0942a6c80982202f1f21b7a2689dd9136e5418fe81                              0.0s
 => => naming to docker.io/library/layers1                                                                                0.0s

Notice the RUN command is in one long line.

» docker buildx build -t layers2 .
[+] Building 0.0s (6/6) FINISHED                                                                                docker:default
 => [internal] load build definition from Dockerfile                                                                      0.0s
 => => transferring dockerfile: 233B                                                                                      0.0s
 => [internal] load .dockerignore                                                                                         0.0s
 => => transferring context: 2B                                                                                           0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                          0.0s
 => [1/2] FROM docker.io/library/alpine                                                                                   0.0s
 => CACHED [2/2] RUN apk add git &&   git version &&   apk del git                                                        0.0s
 => exporting to image                                                                                                    0.0s
 => => exporting layers                                                                                                   0.0s
 => => writing image sha256:0e7e71fe36c427d19e7e0a0942a6c80982202f1f21b7a2689dd9136e5418fe81                              0.0s
 => => naming to docker.io/library/layers2                                                                                0.0s

Also, take a look at the image size difference. The refactored image is a little more than half the size of the original and it achieves the same thing.

» docker images layers*
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
layers2      latest    0e7e71fe36c4   4 minutes ago   9.87MB
layers1      latest    028e5f38db82   9 minutes ago   18.5MB

In this example, it’s not a big deal but you can imagine for more complex images it can make a difference. Once you know how to do this it isn’t difficult, especially after trying it a couple of times.

Shrek has layers; Docker has layers

Dev mode: When working on a Dockerfile, things don’t always work the first time. So what I so is split up the statements into multiple lines. “Wait, isn’t that the opposite of what we’ve been saying?” Yes, but this is temporary for development only. Make use of the build cache and only work on the bottom of the Dockerfile. This way, most of the build can reuse the existing cached layers and we can iterate on changes quicker. When it’s all working the way you want it, refactor into single statements and rebuild.

What do you think? Is it worth optimizing Dockerfiles this way? Do you already write your Dockerfiles like this?