Source code

https://github.com/tsukiy0/docker-secrets-experiment

When building images with Docker, we often need to pass secrets in. For example, we may need credentials to pull dependencies from a private registry. Docker provides a number of methods to pass files and variables at build time, but not all are safe for secrets.

Docker layers

Before we dive into each method, we need to understand how Docker builds images.

FROM debian

RUN apt-get update
RUN apt-get install -y vim

When building the above Dockerfile, each command, that is RUN apt-get update and RUN apt-get install -y vim, is executed and stored as a separate layer. Each layer is a diff of the filesystem between the previous layer and after running the command.

Layers are cached and reused, saving on build times¹ - this is why the initial build may be slow, but subsequent rebuilds are fast. If we added a new command to the above Dockerfile, it would not rebuild each layer again as nothing has changed, it will simply append a new layer with the result of running the new command.

Layers are the reason why most methods of passing files and variables from the host is not suitable for secrets. The layers are persisted, so even if we run commands that delete files, the layer that copied or created those files will still be there for inspection.

Leaky secrets

It might be tempting to simply use the COPY² or ARG³ Dockerfile instruction - but neither is suitable for secrets.

`COPY`ing files

We have a simple Dockerfile where we are copying a file with secrets from the host then removing it.

FROM node:14-alpine

WORKDIR /app
COPY ./secret.txt /app/secret.txt
RUN rm -rf /app/secret.txt

When we build and inspect it, we can see the layers corresponding to each command:

docker build -t copy_experiment .
docker history copy_experiment

# IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
# bff0ec4dd48c   3 hours ago    RUN /bin/sh -c rm -rf /app/secret.txt # buil…   0B        buildkit.dockerfile.v0
# <missing>      3 hours ago    COPY ./secret.txt /app/secret.txt # buildkit    7B        buildkit.dockerfile.v0
# <missing>      3 hours ago    WORKDIR /app                                    0B        buildkit.dockerfile.v0
# ...

The process to extract the secret.txt file is as follows:

Export the image as a .tar⁴

docker save copy_experiment -o image.tar

Extract the .tar to a directory
```
tar -xvf image.tar -C image_extracted
```
Inspect the extracted contents
```
ls -la image_extracted
```
There are several directories corresponding the layers, each will have a layer.tar with the file diff between the layer and the previous layer. There is also a manifest.json and another file <random>.json file with a long random name in the root - we will use these to figure out the layer we want to inspect.
Open the manifest.json and the <random>.json

The <random>.json file has a history field which corresponds to the commands in the Dockerfile, some of which do not result in a layer and will have empty_layer: true.

The manifest.json contains the layer directories under the Layers field.

Everything is in order, so with both files we can match the Dockerfile command with the layer directory.

On finding the layer, we can inspect the files inside the layer's archive

tar -vtf image_extracted/496831ca773c35092cacc3f1d9e5decf604f5d39b173911849144baa661b00dd/layer.tar
# drwxr-xr-x 0/0               0 2022-03-07 08:51 app/
# -rw-r--r-- 0/0               7 2022-03-07 08:45 app/secret.txt

Print the contents of secret.txt file in the layer

tar -O -xf image_extracted/496831ca773c35092cacc3f1d9e5decf604f5d39b173911849144baa661b00dd/layer.tar app/secret.txt

Alternate way to browse layers

The above method of finding the layers by looking at the manifest.json and <random>.json files can become quite difficult when a Dockerfile has many commands and thus layers.

We can leverage the dive⁵ tool to view the file system at each layer.

Using build `ARG`s

Dockerfiles support the ARG command, which defines a build argument. This argument is injected at build time with the --build-arg⁶ flag. The argument can then be used like an environment variable in any of the commands.

The following Dockerfile defines a build argument SECRET and exports it to the environment⁷ and uses it in a echo command.

FROM node:14-alpine

ARG SECRET

ENV SECRET=${SECRET}
RUN echo ${SECRET}

The process to extract the SECRET is as follows:

Build the image specifying a SECRET build argument

docker build --build-arg SECRET=supersecretstring -t buildarg_experiment .

Inspect the history of the image

docker inspect buildarg_experiment
# IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
# fc4a2049cbdb   2 seconds ago   RUN |1 SECRET=supersecretstring /bin/sh -c e…   0B        buildkit.dockerfile.v0
# <missing>      2 seconds ago   ENV SECRET=supersecretstring                    0B        buildkit.dockerfile.v0
# <missing>      2 seconds ago   ARG SECRET                                      0B        buildkit.dockerfile.v0
# ...

As you can see the SECRET is visible.

Safely injecting secrets

One strategy to avoid exposing secrets is to do a multi-stage Dockerfile⁸. This involves an initial build image that is responsible for building an executable, then copying that executable to the final run image. We can inject build arguments and copy secrets to the build image knowing that the layers created will not be persisted in the final run image.

A multi-stage Dockerfile is structure like so:

FROM node:14-alpine as builder

WORKDIR /app
COPY ./secret.txt /app/secret.txt
RUN touch runnable # imaginary executable

FROM node:14-alpine

WORKDIR /app
COPY --from=builder /app/runnable .

We can see two FROM⁹ commands where one is labelled as the builder, representing our build image and the other is our run image.

When we inspect the resulting layers from the final run image there is no information about the secret.txt.

docker build -t multistage_experiment .
docker inspect multistage_experiment
# IMAGE          CREATED             CREATED BY                                      SIZE      COMMENT
# 8c960c11ac09   About an hour ago   COPY /app/runnable . # buildkit                 0B        buildkit.dockerfile.v0
# <missing>      3 hours ago         WORKDIR /app                                    0B        buildkit.dockerfile.v0
# ...

Another advantage of using a multi-stage Dockerfile is the ability to seperate the build and run environments. Typically the build environment has more dependencies as it needs an entire toolchain to build the executable, whereas the run environment just needs the runtime. For example, to build Java .jars, we need the Java Development Kit (JDK), but in order to run a .jar we only need the Java Runtime Environment (JRE), which is much smaller.

Conclusion

Understanding how Docker builds images is important for security. Any file we COPY into the final image, regardless if we delete it with a later command, will be exposed as an intermediate layer. For this reason, we should always copy the bare minimum into a Docker image and in most cases seperate the build and run images using multi-stage Dockerfiles.