My Learning Journey: Building more images, Dockerfiles and multi-stage builds 🐳

This week was entirely focused on a deeper understanding of building and creating images. This includes writing Dockerfiles, understanding the build cache, rearranging layers in images to better support the build cache for faster builds, and learning more about multi-stage builds.

Past Learnings

Last week, I learned an initial bit about registries, learned to build some images, and also learned about Docker Compose and how it eases the life of a "Dockerman." This week was particularly different since I did not touch a lot of new topics but got a deeper understanding on a single topic. Let's go over it now

Dockerfiles: A better way to build images

In the past week, I built images by switching to an interactive terminal in the container itself and installing all packages and dependencies inside it. Well, as Docker says, that is not the best way to build images at all. So, I learned more about Dockerfiles and the different commands it uses to modify, move, install, and copy files and dependencies in the containers. Below is the first Dockerfile that I wrote:

Learning about and building Dockerfiles was a really amazing experience and it felt really intuitive to build images this way.

The build cache: speeding up the build process

When building Docker images, Docker keeps a "build cache" that remembers unchanged layers in an image. If a layer hasn't changed in later builds, Docker doesn't run the command for that layer again. Instead, it loads the layer's output from the build cache, making the build process faster. But if there are any changes in a layer that differ from the build cache, all the subsequent layers are invalidated from the cache as well.

To make better use of this build cache system, Dockerfiles should be written strategically. In the Dockerfile above, the layer RUN yarn install --production runs fresh on every build, even if there are no changes to the project's dependencies. This adds unnecessary time to the build and doesn't use the build cache efficiently. Below is a better version of the previous Dockerfile that uses the build cache much more efficiently:

In this Dockerfile, the RUN command, which installs all dependencies, first checks for any changes in the package.json and yarn.lock files from the source directory. If there are none, the cached results are returned. This gives a more efficient and faster build.

Multi-stage builds: A great way to manage multiple stages of a project's development

Another great feature of Dockerfiles is the presence of multi-stage builds. Multi-stage builds allow us to create separate, independent builds with different configurations from a single Dockerfile. For example, for a Spring Boot project, a development build might require installing dependencies, running all tests, running a local server, and then maybe running Maven to run the application. But for a production build, we would simply need to export a JAR file and run that file. Support for such seperate staging is handled gracefully in a Dockerfile. Below is the example of such a Dockerfile:

Here, the builder stage and the final stage are two separate stages. We can use any stage by using the --target flag while using the docker build command. By default, if no --target flag is specified, Docker automatically uses the last stage defined in the Dockerfile for building the image.

Conclusion

This week was all about learning about images and Dockerfiles. The various user-friendly features provided by Docker for building images are very helpful and much appreciated. That's pretty much it for this week, but next week I will likely move on to some new topics. Until next time, then!