Friday, 10 June 2022

Docker Images - Size matters, But So Does Performance


I recently went through the exercise of re-building a Docker image based on what was supposed to be a stable, well-known application codebase. Along the way I observed an unexpected performance issue.

The application contained within the Docker image was just a Java command line utility for parsing some yaml files to provision kafka resources on our hosted development clusters. The code had not been changed for several months, so this was supposed to just be a matter of setting up a local copy of the Docker image instead of pulling down a trusted third party's image from Dockerhub.

The application was bundled within a Docker contrainer whose Dockerfile was alongside the code, so it should have been a simple matter of using that to produce the image and pushing it to our own repo, and then pulling that down for our runtime use.

It's the same, so why's it different?

We had been running with the existing third party Docker image for several months, so there was a well established history of how long each stage of the deployment pipeline should typically take to run.

When the new Docker image ran it took noticeably longer to complete each stage. I don't have the exact figures in front of me, but can recall that it was in the order of double digit percentage of time slower - so a six minute build might now be taking longer than seven minutes.

Examining the Docker images

The third party's build process for the original Docker image wasn't available for examination, so to compare the Docker images we need to use something like

> docker history --no-trunc <full image name>

From this I was quickly able to establish that there were a couple of significant differences between the application's specified Dockerfile and the Dockerfile that would have been used for building the faster running established version:

  • The base image
    • CentOS Linux versus Alpine Linux
  • The Java runtime
    • Full Java SDK versus jlink with specific modules

Getting back up to speed

Since the purpose of this setup was to be a lift and shift of the existing setup, I adjusted the Dockerfile to involve CentOS Linux as its base image and adjusted it to use a full JDK instead of the clever jlink minimised Java runtime environment.

At this point we were where we wanted to be as our baseline for migrating off the third party Docker image. Our image has the same base OS and Java runtime and performs close enough to the same - without taking the double digit percentage of time longer than our starting point.

What was the issue?

While I was working on this particular setup there was a pressing deadline that I was not free to play around with tuning this setup and isolating whether the issue was due to the OS or the jlink runtime (or something else).

Based on what I have seen mentioned online, I suspect that there may have been some aspect of the application that involved heavy use of system calls that were not set up to run Java efficiently with Alpine's musl library. For now that it just a theory, and not something that I have managed to reproduce on a simplified locally built application.

If the runtime environment had involved inputs from external systems I would have been more motivated to try to keep us on Alpine to minimise the potential vulnerabilities as it tends to have fewer services and libriaries that tend to have CVEs representing potential security vulnerabilities.


No comments:

Post a Comment