Did you know that every line in your Dockerfile that starts with the all-caps statement such as ENV, COPY, or RUN creates a new layer in the docker image? This results in more layers to download and possibly larger file sizes for the images.
“Whatever, my network is fast and once I have the image I’m good to go,” I hear you say. But consider the pipelines and all the environments where the image might run for dev and testing. Each image download needs to pull all the layers. I think it could be worth a bit of effort to reduce this, especially because it is easily done.
Let’s make a useless docker image as an example which installs git then removes it after running a git command. Here is our Dockerfile:
FROM alpine
ENV PACKAGES="git"
RUN apk add ${PACKAGES}
ENV FOO="bar"
LABEL key=value
RUN git version
RUN apk del ${PACKAGES}
Build the image and review the layers of any image using the docker history
command:
» docker buildx build -t layers1 .
[+] Building 0.0s (8/8) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 164B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.0s
=> [1/4] FROM docker.io/library/alpine 0.0s
=> CACHED [2/4] RUN apk add git 0.0s
=> CACHED [3/4] RUN git version 0.0s
=> CACHED [4/4] RUN apk del git 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:028e5f38db821c7e34398216b7e9c234f717897cfb182608733307545f8111a8 0.0s
=> => naming to docker.io/library/layers1
» docker history layers1
IMAGE CREATED CREATED BY SIZE COMMENT
028e5f38db82 About a minute ago RUN /bin/sh -c apk del ${PACKAGES} # buildkit 26kB buildkit.dockerfile.v0
<missing> About a minute ago RUN /bin/sh -c git version # buildkit 0B buildkit.dockerfile.v0
<missing> About a minute ago LABEL key=value 0B buildkit.dockerfile.v0
<missing> About a minute ago ENV FOO=bar 0B buildkit.dockerfile.v0
<missing> About a minute ago RUN /bin/sh -c apk add ${PACKAGES} # buildkit 11.5MB buildkit.dockerfile.v0
<missing> About a minute ago ENV PACKAGES=git 0B buildkit.dockerfile.v0
<missing> 10 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 10 months ago /bin/sh -c #(nop) ADD file:40887ab7c06977737… 7.04MB
We see that each line in the Dockerfile is a layer in the image history. Let’s refactor the image with fewer layers. We can continue statements onto new lines by adding \
at the end of each line.
FROM alpine
ENV \
PACKAGES="git" \
FOO="bar"
LABEL \
key=value \
key2=value2
RUN apk add ${PACKAGES} && \
git version && \
apk del ${PACKAGES}
This is also a more readable file, better organized with variables declared at the top. Now build and check history again:
» docker buildx build -t layers1 .
[+] Building 3.8s (6/6) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 198B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.0s
=> CACHED [1/2] FROM docker.io/library/alpine 0.0s
=> [2/2] RUN apk add git && git version && apk del git 3.0s
=> exporting to image 0.7s
=> => exporting layers 0.7s
=> => writing image sha256:0e7e71fe36c427d19e7e0a0942a6c80982202f1f21b7a2689dd9136e5418fe81 0.0s
=> => naming to docker.io/library/layers1 0.0s
Notice the RUN command is in one long line.
» docker buildx build -t layers2 .
[+] Building 0.0s (6/6) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 233B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.0s
=> [1/2] FROM docker.io/library/alpine 0.0s
=> CACHED [2/2] RUN apk add git && git version && apk del git 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:0e7e71fe36c427d19e7e0a0942a6c80982202f1f21b7a2689dd9136e5418fe81 0.0s
=> => naming to docker.io/library/layers2 0.0s
Also, take a look at the image size difference. The refactored image is a little more than half the size of the original and it achieves the same thing.
» docker images layers*
REPOSITORY TAG IMAGE ID CREATED SIZE
layers2 latest 0e7e71fe36c4 4 minutes ago 9.87MB
layers1 latest 028e5f38db82 9 minutes ago 18.5MB
In this example, it’s not a big deal but you can imagine for more complex images it can make a difference. Once you know how to do this it isn’t difficult, especially after trying it a couple of times.
Dev mode: When working on a Dockerfile, things don’t always work the first time. So what I so is split up the statements into multiple lines. “Wait, isn’t that the opposite of what we’ve been saying?” Yes, but this is temporary for development only. Make use of the build cache and only work on the bottom of the Dockerfile. This way, most of the build can reuse the existing cached layers and we can iterate on changes quicker. When it’s all working the way you want it, refactor into single statements and rebuild.
What do you think? Is it worth optimizing Dockerfiles this way? Do you already write your Dockerfiles like this?