π Docker Build Showdown: Regular Docker Build -vs- Multistage Docker Build
How do Multi-stage builds reduce the size of your Docker Image?
Table of contents
- π Introduction
- π€ What's the Buzz About Multistage Builds?
- π‘ The basic idea behind Dockerβs multistage builds
- πΆ Enter Into The Multi-stage Builds
- π Advantages of Multistage Docker Builds
- π Disadvantages of Multistage Docker Builds
- π Let's See It in Action!
- β Build the Dockerfile
- β Build the Dockerfile using Multistage-Build
- π₯―Distroless images
- π Conclusion
π Introduction
Docker, the superhero of containerization, offers multiple ways to build images. We're pitting the regular Dockerfile build against the multistage Dockerfile build. Who'll emerge victorious? Let's dive into the ring and find out!
π€ What's the Buzz About Multistage Builds?
π Multistage Docker builds mean using multiple build stages(multiple FROM
statements) in one Dockerfile to craft a final image. Each stage has its own set of instructions, like a team of superheroes working together!
Multi-stage builds combined with slim base images is the single most effective technique to reduce the size of your Docker images.
π‘ The basic idea behind Dockerβs multistage builds
π Usually youβd start out with a fat base image like Ubuntu or an image specific to your programming language. This comes packed with essential build tools like Compilers.
π Next, youβd run commands to download necessary dependencies like libraries, testing frameworks, linters, security scanners and so on. All of these are needed for your code to pass necessary quality checks, compile and produce the final artifact(s).
π At this point, your application is built and ready to run, so you deploy this image in production.
But we no longer need all those dependencies used during the BUILD phase. If your app is written in C++, you probably only need the compiled binary for production.
And yet, our prod container carries that 900MB worth of burden with it π
So we obviously shed all that load!
But traditionally, this has been very tedious to achieve in Docker, involving hacks, custom bash scripts and lots of spaghetti code to maintain π’
πΆ Enter Into The Multi-stage Builds
These allow you to split your Docker image definition into multiple STAGES. Every time you use a βFROM <base image>β
statement in your Dockerfile, it's a new stage.
You can cherry-pick the files to include from each stage into your final image. So itβs not a surprise that youβd only pick the final executable file to put into your final image.
You can choose a lightweight base image such as Alpine or Distroless
and just add your executable to it.
This is the most powerful way to end up with a super light image that is easy to run and maintain in production π
π Advantages of Multistage Docker Builds
π Smaller Image Size: Trim the fat! Multistage builds allow you to discard unnecessary bits, leading to svelte and efficient final images.
ποΈ Speedy Builds: Cut to the chase! Multistage builds to speed up the process by having fewer layers and less data to shuffle around.
π‘οΈ Enhanced Security: Shield's up! Unnecessary tools or dependencies used during the build process don't find their way into the final image, minimizing attack vectors.
π’ Simplified Deployment: Lighten the load! With multi-stage builds, you only ship the final image, skipping the hassle of managing intermediary build artifacts.
π Disadvantages of Multistage Docker Builds
𧩠Complexity: It's like building a puzzle. Multistage builds can become complex if not organized well. Debugging might involve understanding each stage's role.
π Learning Curve: Buckle up for learning! The concept of multistage builds might require extra understanding, especially for newcomers.
π Let's See It in Action!
Imagine we have a Python Flask app, and we want to utilize multistage builds to optimize our Docker image. Here's a simplified example:
# Stage 1: Build the base image with Python and Poetry dependencies
FROM python:alpine3.7
# Copy the entire current directory to the /app directory in the image
COPY . /app
# Set the working directory to /app
WORKDIR /app
# Install Python packages from requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# sets an environment variable named PORT with the value 5000.
ENV PORT 5000
# container will expose port 5000
EXPOSE 5000
# Set the entrypoint to "python" as the executable
ENTRYPOINT [ "python" ]
# Run the application app.py
CMD [ "app.py" ]
β Build the Dockerfile
First, we will normally build this Dockerfile, and let's see the final image size after building the image
To build the Docker images use this command:
docker build -t <image-name> .
π Here you can see the final build image size is 98.4MB
β Build the Dockerfile using Multistage-Build
Here we will now create the multi-stage Dockerfile
# Stage 1: Build Stage
FROM python:alpine3.7 as builder
# Copy the entire current directory to the /app directory in the image
COPY . /app
# Set the working directory to /app
WORKDIR /app
# Install Python packages from requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Stage 2: Production Stage or any
FROM python:alpine3.7
# Copy the application code and installed packages from the builder stage
COPY --from=builder /app /app
# Set the working directory to /app in the new stage
WORKDIR /app
# Set an environment variable for the application's port
ENV PORT 5000
# Expose the application port
EXPOSE 5000
# Set the entrypoint to "python"
ENTRYPOINT [ "python" ]
# Set the default command to "app.py"
CMD [ "app.py" ]
Explanation:
Stage 1 (Build Stage):
Start with the
python:alpine3.7
base image and name this stage asbuilder
.Copy the entire contents of the current directory (
.
) into the/app
directory within the image.Set the working directory to
/app
.Install Python packages listed in
requirements.txt
usingpip
. We use--no-cache-dir
to avoid caching pip package metadata.
Stage 2 (Production Stage):
Start from the same
python:alpine3.7
base image.COPY --from=builder
- Copy the application code and installed packages from thebuilder
stage's/app
directory to the/app
directory in the current image.Set the working directory to
/app
in the new stage.Set an environment variable named
PORT
with the value5000
.Expose port 5000 to allow incoming traffic for the application.
Set the entrypoint to
"python"
. This specifies the default command that will be run when a container starts.Set the default command to
"
app.py
"
, which is the script that will be executed when the container starts.
Now, we will build this Multistage Dockerfile, and let's see the final image size after building the image
π Here you can see the final build image size is 81.3MB
π₯―Distroless images
Distroless images, often referred to as "distroless" containers, are Docker images that are designed to be as minimalistic as possible. They aim to provide a highly secure and lightweight environment for running applications while significantly reducing the attack surface and potential vulnerabilities that can be exploited by attackers. Distroless images are particularly popular in the context of containerized applications where security and efficiency are critical.
π Key characteristics of distroless images include:
No Operating System:
βπ§ Distroless images intentionally exclude a traditional Linux distribution or operating system components. This means they lack utilities, shells, package managers, and other components commonly found in typical Linux distributions. This minimalistic approach reduces the potential attack vectors that could be exploited.
Only Essential Libraries:
ππ« Distroless images include only the necessary libraries required to run the specific application. Unnecessary system libraries and binaries are excluded, further reducing the image's size and complexity.
Security-Focused:
ππ‘οΈ By reducing the software stack to the bare essentials, distroless images inherently have a smaller attack surface. Fewer components mean fewer potential vulnerabilities that could be exploited by malicious actors.
Smaller Image Size:
ππ½ Distroless images are known for their small image size compared to traditional images based on full-fledged Linux distributions. This is beneficial for faster image distribution and deployment, especially in environments where bandwidth and storage resources are limited.
Use Case Specific:
π οΈπ§©Distroless images are designed to run specific types of applications, like Java applications, Python applications, or Node.js applications. Each distroless variant is tailored to the requirements of the runtime environment.
Immutable and Stateless:
ποΈπDistroless containers follow the principles of immutable infrastructure and statelessness, aligning with modern container best practices.
It's important to note that while distroless images offer enhanced security and efficiency benefits, they might not be suitable for all use cases. Some applications may require specific system utilities or components that are intentionally excluded in distroless images. Additionally, debugging and troubleshooting might be more challenging due to the lack of traditional tools commonly available in full Linux distributions.
π Conclusion
In the above-explained example -- The multi-stage build helps in creating a smaller production image by excluding unnecessary files and intermediate build artifacts. The Builder
the stage is used for installing dependencies and the production
stage contains the minimum required for running the application.
You can find some Distroless images here... https://github.com/GoogleContainerTools/distroless/tree/main
You can find some Distroless images examples here...
https://github.com/GoogleContainerTools/distroless/tree/main/examples
Check Out the Video for the Multi-Stage Docker Build...
Thank youππ... for taking the time to read this blog. I hope you found the information helpful and insightful. So please keep yourself updated with my latest insights and articles on DevOps π by following me on
So, Stay in the loop and stay ahead in the world of DevOps!
Happy Learning !... Keep Learning ! π