Tuesday 21 June 2022

Speeding up Software Builds for Continuous Integration

Downloading the Internet

Can you remember the last time you started out on a clean development environment and ran the build of some software using Maven or Gradle for dependency management? It takes ages to download all of the necessary third party libraries from one or more remote repositories, leading to expression like, "Just waiting for Maven to download the Internet".

Once your development environment has been used for building a few projects the range of dependencies that will need to be downloaded for other builds reduces down as the previously referenced onces will now be cached and found locally on your computer's hard drive.

What happens on the Continuous Integration environment?

Now consider what goes on when Jenkins or your other preferred Continuous Integration server comes to build your software. If it doesn't have a local copy of the libraries that have been referenced then it is going to pay the cost of that slow "download the Internet" process every single time that it comes to check out your latest changes and run a build.

What are the main costs involved here?

  • Developer time waiting on the build to complete before moving on to the next change
  • Data transfer charges for sourcing from external repositories

Cutting down costs - saving time

What options do we have available for reducing these costs?

  1. Localise the artifact repository, acting as a pass-through cache
  2. Or Pre-download the most common artifacts in a build container image

Option 1 would involve the selection and setup of an appropriate artifact repository manager such as Nexus or Artifactory. There's a reasonable chance that if your organisation happens to write your own reusable libraries then this will be already be in place for supporting the distribution of those artifacts anyway, so it may just be a matter of re-configuring the setup to support mirroring of external third party libraries sources from external repositories.

Option 2 may seem a bit counter-intuitive as it would go against the current trend of trying to minimise container sizes and to be generally useful it would need to contain a broader range of artifacts than any one project's build would require.

Keep it local

For both options the performance improvement comes down to locality of reference. The builds should be able to obtain most, if not all, dependencies without having to go beyond the organisation's private build environment's network - whether that be a Virtual Private Cloud or a data centre.

With this type of setup in place builds should be able to spend less time on initial setup, and be more focussed on compilation, running tests, and ultimately making the new known good version of the code available for use.

If you want to understand the potential time savings on offer here, just try temporarily moving the content of your local development environment's build cache away and see how long a build takes. For a typical Java microservice I would not be at all surprised if the build time doubles or even triples for having to obtain the build plugin libraries, the application's direct dependencies, and all of the transitive dependencies.

No comments:

Post a Comment