Tuesday 27 December 2022

2022 - A year in review

Just a look back over the last 12 months.

January

I moved back to Christchurch to live, after having spent a few months further south since moving back from London.

Work was mainly around balancing other peoples' understanding and expectations around our use of Kafka.

February

I decided that it would be worthwhile to have a year's subscription for streaming Sky Sports, as some rugby matches that I would want to watch would be on at time when venues wouldn't be open.

Having moved to Christchurch to be close to an office, now found myself working from home as Covid restrictions came back into effect across New Zealand.

March

Got back into some actual coding at work - as opposed to mainly reviewing pull requests for configuration changes for Kafka topics.  This became urgent, as the command line interface tool that our provisioning system was dependent on had been marked for deprecation.

April 

Had my first direct experience with Covid-19.  I only went for a test because a friend had mentioned that a runny nose was his first symptom.

May

Managed to roll my ankle as I was leaving the house for the evening.  I thought about going back inside and resting it up with ice etc. - but then decided to try walking off the pain and carried on to a pub quiz.  My team won the pub quiz, and I got an Uber home.

A couple of days later my ankle swelled up so much that it was too painful to walk on.  This lasted a few weeks.

June

Heard from recruiter that was working as a local sourcer for Atlassian, now that they are fully open to remote working.

Had a family member come to visit for a few days.  On the second day they seemed a bit ill so I ordered some more Covid tests - I was okay, but they tested positive and needed to isolate for a week or so. 

July

A few stages of interviews with Atlassian.

Went down south for a weekend, including watching The All Blacks versus Ireland at Forsyth Barr Stadium in Dunedin.

Attended a comedy show by Rhys Darby - he's the comedian / actor who palyed the character Murray from the Flight of The Conchords television series.

August

Final stage of interviews with Atlassian.

Received and accepted offer to join Atlassian as a Senior Software Engineer.

New laptop arrived - glad it was a Mac, as I wasn't sure whether that had been something that I had asked about during the interview process.

September

Properly started the new job.

Purchased a decent office chair, and standing desk - all within budget for being expensed back.  A welcome improvement over sitting at the kitchen table.

October

More learning about Atlassian systems, and familiarising myself with the services that my team is responsible for.

November

Learning about another existing service that will be moving across to my team for further development and maintenance.

December

Went to Sydney to meet up with the rest of my work teammates - several of whom also had to travel across from other parts of Australia.

Enjoyed my first experience of an escape room, which was the team building exercise that we chose.

Went to Otautahi Smoke - an afternoon and evening of live music, BBQ food and beers in Hagley Park.

Wednesday 13 July 2022

Designing systems - The "ity"s That Limit or Enable Profitability

Introduction

This started off as a little aide-mémoire to get my head into the right space for preparing for an interview. It's not an exhaustive list, and twists terminology that has been used to represent other things (see:  to Velocity), so don't treat it as a text book reference to work from.

Most of the listed points can be associated back to so called "non-functional requirements" - NFRs. I don't like that particular terminology, so alternatively we might consider them as dimensions of the quality of the sytem.

Usability

"If you build it, they will come" should come with a provisor, "... but if it's awkward to use they'll soon go away, and might not come back."

Security

All of the aspects that combine to protect data from being seen or manipulated by anyone other than the intended recipient or sender, and also assuring users that the data has originated from the intended source.

Velocity

Here I'm cheating a bit by trying to come up with a term to represent the speed at which a system can respond to user input - not development velocity.

Accessibility

This has multiple dimensions to it, ranging from the devices that can present the interface, to the affordances offered for people with disabilities.

Reliability

The system does what it is intended and expected to do, in a consistent manner over a significant period of time. 

Elasticity / Scalability / Capacity

How well the system can cope when it becomes popular enough to attract a lot of users.

Likewise, how well it can scale back down to a level that is suitable when there is less demand - and less need for potentially expensive resources to be available.

Adaptability / Flexibility

It's not necessarily always the case, but given a range of possible technologies to choose from, they will often have an associated time or money cost for applying changes.

Not all roads lead to Profitability

In a commercial product, these are intended to combine to lead to profitability.

Not all products will consider these as being equally high priority, so you may find it a valuable exercise to get your team together and agree on a relative ranking so that you can focus on what is important for your business to succeed with the challenged and opportunities in the current environment.

Visualise Priority Ranking

I'd even go so far as to suggest having a visual representation of the value rankings so that there can be little doubt what to prioritise when making changes - in the days of office working this might be something like a poster on the wall or an A4 printout in the top corner of the whiteboard where the team has their stand-up meetings.


Sunday 3 July 2022

Running Java with Preview Features in the Cloud - Part One

Introduction

I've been catching up on some features that have been added in recent versions of Java. The 6 month release cadence of new versions of Java is great, but can lead to a build up of new things to learn about.

The support for pattern matching in switch statements - JEP 406 - is particularly appealing, but for now it is still only available as a preview feature, meaning that at compile time and at run time we need to explicitly specify to enable preview.

A shallow view of the main cloud providers

A lot of online applications these days will run in some sort of cloud runtime environment.  Some examples from the main cloud providers are:

According to what the documentation currently specifies, AWS Lambda's pre-packaged Java environments only support versions 8 and 11 unless you bring your own Docker container. Similarly, Azure Functions only offer versions 8 and 11. This leaves us to consider Google Cloud Functions which supports and recommends Java 17.

What can we try out?

As far as I can tell, the Google Cloud Function way of running Java doesn't allow us to control command line arguments to the Java runtime, so we cannot simply specify --enable-preview that way.

This leaves us to try out customizing AWS Lambda to:

  • set up a Docker container including the Java 17 runtime
  • set up a wrapper script to pass --enable-preview as a command line paramater to make the lambda initialize with the functionality that we want.

Tuesday 21 June 2022

Speeding up Software Builds for Continuous Integration

Downloading the Internet

Can you remember the last time you started out on a clean development environment and ran the build of some software using Maven or Gradle for dependency management? It takes ages to download all of the necessary third party libraries from one or more remote repositories, leading to expression like, "Just waiting for Maven to download the Internet".

Once your development environment has been used for building a few projects the range of dependencies that will need to be downloaded for other builds reduces down as the previously referenced onces will now be cached and found locally on your computer's hard drive.

What happens on the Continuous Integration environment?

Now consider what goes on when Jenkins or your other preferred Continuous Integration server comes to build your software. If it doesn't have a local copy of the libraries that have been referenced then it is going to pay the cost of that slow "download the Internet" process every single time that it comes to check out your latest changes and run a build.

What are the main costs involved here?

  • Developer time waiting on the build to complete before moving on to the next change
  • Data transfer charges for sourcing from external repositories

Cutting down costs - saving time

What options do we have available for reducing these costs?

  1. Localise the artifact repository, acting as a pass-through cache
  2. Or Pre-download the most common artifacts in a build container image

Option 1 would involve the selection and setup of an appropriate artifact repository manager such as Nexus or Artifactory. There's a reasonable chance that if your organisation happens to write your own reusable libraries then this will be already be in place for supporting the distribution of those artifacts anyway, so it may just be a matter of re-configuring the setup to support mirroring of external third party libraries sources from external repositories.

Option 2 may seem a bit counter-intuitive as it would go against the current trend of trying to minimise container sizes and to be generally useful it would need to contain a broader range of artifacts than any one project's build would require.

Keep it local

For both options the performance improvement comes down to locality of reference. The builds should be able to obtain most, if not all, dependencies without having to go beyond the organisation's private build environment's network - whether that be a Virtual Private Cloud or a data centre.

With this type of setup in place builds should be able to spend less time on initial setup, and be more focussed on compilation, running tests, and ultimately making the new known good version of the code available for use.

If you want to understand the potential time savings on offer here, just try temporarily moving the content of your local development environment's build cache away and see how long a build takes. For a typical Java microservice I would not be at all surprised if the build time doubles or even triples for having to obtain the build plugin libraries, the application's direct dependencies, and all of the transitive dependencies.

Monday 20 June 2022

Docker SBOM - Software Bill Of Materials

In an earlier post on this blog I was curious about comparing Docker images to try to track down the differences that might be causing performance problems. Since then I have had a play with the sbom Docker command for listing out what is included in the image.

Following the documentation at: https://docs.docker.com/engine/sbom/

Below is an example of the output of a run of a locally built app:

> docker sbom hello-world-alpine-jlink:latest

 

Syft v0.43.0
 ✔ Loaded image            
 ✔ Parsed image            
 ✔ Cataloged packages      [16 packages]

NAME                    VERSION       TYPE         
alpine-baselayout       3.2.0-r20     apk           
alpine-baselayout-data  3.2.0-r20     apk           
alpine-keys             2.4-r1        apk           
apk-tools               2.12.9-r3     apk           
busybox                 1.35.0-r13    apk           
ca-certificates-bundle  20211220-r0   apk           
docker-comparison       1.0-SNAPSHOT  java-archive  
jrt-fs                  11.0.15       java-archive  
libc-utils              0.7.2-r3      apk           
libcrypto1.1            1.1.1o-r0     apk           
libssl1.1               1.1.1o-r0     apk           
musl                    1.2.3-r0      apk           
musl-utils              1.2.3-r0      apk           
scanelf                 1.3.4-r0      apk           
ssl_client              1.35.0-r13    apk           
zlib                    1.2.12-r1     apk   

 

This is a much more detailed listing of the components that are included in the docker image than we would get from looking at the Dockerfile or image history, so I would recommend it as a way of checking what you are including in an image. The main feature request that I have is to separate the artifacts by type, though in this trivial example that is simple enough to do by just looking at the listing.


Tuesday 14 June 2022

The Importance of Segmenting Infrastructure

Kafka for Logging

I was recently poking around in the source code of a few technologies that I have been using for a few years when I came across KafkaLog4jAppender. It enables you to use Kafka as a place to capture application logs. The thing that caught my eye was the latest commit associated with that particular class, "KafkaLog4jAppender deadlocks when idempotence is enabled".

In the context of Kafka, idempotence is intended to enable the system to avoid producing duplicate records when a producer may need to retry sending events due to some - hopefully - intermittent connectivity problem between the producer and the receiving broker.

The unfortunate situation that arises here is that the Kafka client code itself uses Log4j, so it can result in the application being blocked from sending its logs via a Kafka topic because the Kafka client Producer gets deadlocked waiting on transaction state.

Kafka For Metrics - But Not For Kafka Metrics

This reminded me of a similar scenario where an organisation might choose to use Kafka as their mechanism for sending out notifications of metrics for their microservices and associated infrastructure. If Kafka happens to be part of the infrastructure that you are interested in being able to monitor, then you need to keep those resources isolated from the metrics Kafka - otherwise you run the risk of an incident impacting Kafka which prevents the metrics from being transmitted.

Keeping Things Separated

A real world example of keeping infrastructure isolated from itself can be seen in the way Confluent Cloud handles audit logs. I found it a little confusing at first, as the organisation that I was working for at the time only had Kafka clusters in a single region, but the audit logs were on completely separate infrastructure in another region and even another cloud provider.

Sometimes You're Using A Service Indirectly

A slightly different - but no less significant - example of the need for isolating resources can arise when a particular type of infrastructure is being used for different types of workload. Rather than having a "big bang" release of changes to all of the systems, a phased rollout approach can be taken. One of my earliest involvements with using AWS came shortly after their 2015 DynamoDB outage, which had a ripple out impact for a range of other AWS services because behind the scenes those other services were themselves utilising DynamoDB.

It's my understanding that AWS subsequently moved to isolating their internal services' DynamoDB resource from general consumers' DynamoDB infrastructure - but don't quote me on that.

Friday 10 June 2022

Docker Images - Size matters, But So Does Performance

Introduction

I recently went through the exercise of re-building a Docker image based on what was supposed to be a stable, well-known application codebase. Along the way I observed an unexpected performance issue.

The application contained within the Docker image was just a Java command line utility for parsing some yaml files to provision kafka resources on our hosted development clusters. The code had not been changed for several months, so this was supposed to just be a matter of setting up a local copy of the Docker image instead of pulling down a trusted third party's image from Dockerhub.

The application was bundled within a Docker contrainer whose Dockerfile was alongside the code, so it should have been a simple matter of using that to produce the image and pushing it to our own repo, and then pulling that down for our runtime use.

It's the same, so why's it different?

We had been running with the existing third party Docker image for several months, so there was a well established history of how long each stage of the deployment pipeline should typically take to run.

When the new Docker image ran it took noticeably longer to complete each stage. I don't have the exact figures in front of me, but can recall that it was in the order of double digit percentage of time slower - so a six minute build might now be taking longer than seven minutes.

Examining the Docker images

The third party's build process for the original Docker image wasn't available for examination, so to compare the Docker images we need to use something like

> docker history --no-trunc <full image name>

From this I was quickly able to establish that there were a couple of significant differences between the application's specified Dockerfile and the Dockerfile that would have been used for building the faster running established version:

  • The base image
    • CentOS Linux versus Alpine Linux
       
  • The Java runtime
    • Full Java SDK versus jlink with specific modules

Getting back up to speed

Since the purpose of this setup was to be a lift and shift of the existing setup, I adjusted the Dockerfile to involve CentOS Linux as its base image and adjusted it to use a full JDK instead of the clever jlink minimised Java runtime environment.

At this point we were where we wanted to be as our baseline for migrating off the third party Docker image. Our image has the same base OS and Java runtime and performs close enough to the same - without taking the double digit percentage of time longer than our starting point.

What was the issue?

While I was working on this particular setup there was a pressing deadline that I was not free to play around with tuning this setup and isolating whether the issue was due to the OS or the jlink runtime (or something else).

Based on what I have seen mentioned online, I suspect that there may have been some aspect of the application that involved heavy use of system calls that were not set up to run Java efficiently with Alpine's musl library. For now that it just a theory, and not something that I have managed to reproduce on a simplified locally built application.

If the runtime environment had involved inputs from external systems I would have been more motivated to try to keep us on Alpine to minimise the potential vulnerabilities as it tends to have fewer services and libriaries that tend to have CVEs representing potential security vulnerabilities.

 

Monday 11 April 2022

Expiring CA Certificates - How not to get caught out

I never thought it would happen to me. I was careful, I prepared well in advance, I even had multiple environments to test things out in...

I got caught out by clutter. I had updated the correct file in the development environment, but updated a file with the same name in a slightly different location in production.

A brief check of the system with the new certificate in place seemed fine - the certificate didn't look like it was due to expire on the known expiry date.

That's the problem with an expiring CA certificate - it's not front and centre showing up as something you need to be concerned about. The chain of trust is a bit less visible, you have to click through to see the details.

In the heat of the moment, troubleshooting what might have gone wrong with the setup I even repeated the mistake of copying the file in the wrong location.


Monday 11 October 2021

My History With Open Source

From CPAN to Design Patterns

Throughout my career I've benefited greatly from being able to utilise open source software that other developers have produced and made freely available.

Some of my earliest commercial project work benefitted from libraries made available for Perl via the Comprehensive Perl Archive Network (CPAN). It sometimes felt like our company had a huge advantage over organisations that used VB Script for developing ASP pages, as they seemed to be tied into the world of closed source and needing to pay to use libraries that other organisations had developed as a licensed product for sale.

In the early two thousands I was continuing my university studies as a part time student while working as a software developer. One of the distributed systems courses gave me some exposure to JBoss and Tomcat, which made me question why we were paying to use commercial application servers for some of our clients' projects in my day job.

Aside from the common day to day helper libraries such as JUnit and log4j, Tomcat was probably the first major open source Java system that I brought into my work, proving out that we didn't need EJBs and all of their standards for getting some data out of a database and onto some web pages. At around the same time we were probably dabbling with Apache JMeter as a mechanism to validate that this new kid on the block (well, our block at least) was going to cope with what we anticipated was going to be thrown at it.

Although we didn't use any particular library for it, I would also consider design patterns as an example of shared knowledge that really helped us to achieve our scalability and performance goals. Safely caching data that was being read frequently, but only being updated only a few times each day. If you went skiing in New Zealand in the early 2000s then I can almost guarantee that you checked snow reports using code that I developed.

Giving Something Back

Open source licenses can be a legal minefield for the owners and the users of products and libraries.

Working in large corporations often involves policies and even clauses in employment contracts - along the lines of "If you develop on company time, or on company hardware then anything produced as a result is the property of the company" and / or "The use of any open source software is expressly forbidden unless it has been formally approved by the XYZ committee".

Even smaller companies need to be aware of the differences between GPL, MIT, Apache and other variations of licenses before building a product up.

So far my main contributions to open source projects have mainly been limited to minor improvements to the documentation, and a couple of minor bug fixes for some smaller projects. Correcting typos and improving grammar can be a small way of helping out - providing that it isn't pedantic or debatable whether the new phrasing is better. So far I have had all of my contributions accepted with gratitude, as the original developers sometimes have English as a second language or just slipped up a little in the rush of getting something out and released.

Personally, I also find that by contributing to explaining how something works I can improve my ability to understand and recall that information later on. So, as well as being a good way to make an initial contribution to an open source community, consider that by improving your understanding you will also be moving some way towards being able to contribute to the code as well.


Sunday 10 October 2021

People Skills In A Software Development Team

Introduction

In my opinion the old stereo type of expert developers being left on their own to solve the big problems isn't the way that robust software should be developed.

As often as not when you hear a story about hero developers who are often arrogant and / or jaded about their experiences and are not open to collaborating. That doesn't make for a nice workplace, or sustainable product development.

If you Google'd "no assholes" you will probably be presented with a long list of articles and blog posts describing how organisations have come to the realisation that it is in their interests to not hire or tolerate individuals who are not team players - even if they are the best in their area of expertise. (NB: In this context I don't mean "team player" in the sense of being pushovers who just do what they are told).

This is yet another post of me trying to remind myself about what I've been doing to work well within a team - a nice refresher before I go into job interviews with their HR situational questions.

Empathy Is Key

In the heat of the moment it can be tempting to assume that a problem is somebody else's fault, or that another team doesn't care as much as we do. Most of the time that isn't really the case. Without knowing the wider context around when, why and how something came to be we really shouldn't assume the worst.

Sometimes the root of the problem will be me - like the time that I made a quick one line change to a block of code that I had been working on and when that change was deployed the service started to randomly stop connecting to a service that it was interacting with. I hadn't looked at the context around when that line of code was being called, so didn't notice that I was preventing connections from being closed. 🤦

Psychological Safety

I used to switch off when I first heard the term, "Psychological Safety" as it seemed like something leading to sitting around a campfire and singing Kumbaya.

Much later in my career I paid a litte bit more attention and realised that it is a very simple concept of how comfortable colleagues feel about working with eachother. I see it as being at ease with speaking your mind, knowing that nobody on the team with belittle your ideas, that you will be listened to and understood, and "there are no stupid questions".

One of the personal highlights of my career was on a spontaneous mini pub crawl that arose from having to be out of the office while movers were coming to relocate our equipment. The lunch and conversations were quite fun, but the thing that has really stuck in my mind was when we were walking between pubs and I mentioned to one of the senior members of another team that I felt like I was fitting in well after only a few weeks in the company, his response was "We like you, you ask intelligent questions".

Care, Respect, and Courtesy

I think this was a school motto, but I can't trace it back as that was over thiry years ago and some schools seem to chop and change their motto - and it's not really that important to know the origin story.

Anyway, lets consider each how term can be applied to the working environment.

I use the term "careful" a lot more than the term "care", mainly because I'm used to talking about solving technical problems and how we need to ensure that we don't lose data or make our systems unavailable for our users. This shows caring about the people that interact with the software that we're developing.

Alongside that, life tends to go much more smoothly when we are considerate to our colleagues, whether that means informing them in advance of upcoming changes, or checking in with them on Slack to understand how they are getting on with situations that arise in or out of the work setting. Not just a, "Get well soon" response if they have to take a sick day.

Respect and courtesy often go together, such as listening to what a person has to say without interrupting, and if there is something to challenge being polite enough to address the idea rather than the person who raised the idea. On my first full-on scrum team some of my German colleagues raised these sorts of manners as part of the team's agreed ways of working, and at the time I wasn't sure whether that needed to be said, however I noticed when that was missing from a team that I worked in later in my career and moved to correct that.

Obligation To Dissent

Obligation to dissent was another unusual term that I came across in a team ways of working discussion. In that context it meant that if you had a strong opinion or objection to something that the team was going to do then it was your responsibility to raise your concern and have it addressed.

This approach could be quite distructive on its own, and should be paired with something like ", but be willing to move forward with the team's agreed approach".

Wednesday 6 October 2021

My Operations History - A Journey Towards Appreciating DevOps And Infrastructure As Code

Introduction

These days I consider myself to mainly be a developer, but with a solid background in getting services and apps operational.

This post is intended to focus on some of the work that I have done so far in my career that I would consider as being more focussed on the operations side of technology-driven projects.  It can be read as background to my appreciation of the DevOps approach to developing and deploying applications.

If you came to read about my medical history then that is also covered here: I've never had an operation.

University days

Personal hardware

Back in the 90s PCs were still the dominant home computer, and as my brother was into electronics and electrical engineering I went down the path of assembling my first proper computers from components. The level of compontent that I'm referring to here is consumer-accessible ready-made modules rather than individual chips and circuit boards.  I never took up a soldering iron in anger.

So, when I had enough of my own money (being a poor student, money was not abundant), I selected a case, hard drive, motherboard, CPU, memory, video card, and sound case and went about combining them to form the basis of a working computer.

Back then the Internet wasn't a common in every home, so I didn't need a modem or network card to start with.  The display was a CRT monitor that I had received as a hand-me-down when a member of the family had upgraded their system.

Unix, shell scripting and cgi-bin (server side "dynamic" web)

Around the same time I was learning some C programming and various useful utilities on Solaris at university both as part of my coursework, as well as out of general curiosity of how to get things done.

I was very interested in how the web worked, and managed to set up some cgi scripts that would run under my home directory on one of the computer science department's servers, though I think this didn't get much beyond calling a random number generator to determine which image to show from a sub-directory on the server.

Startup Systems Administration

Setting up Workstations

Being one of the first employees at a growing company meant that I was able to get involved in configuring the workstations of the new employees.  Back then that involved installing the OS (Windows NT 4, later Windows 2000) and a few core applications: virus scanner, current version of Java SDK, Perl, IIS, MS Office.  We were quite a small scale operation of fewer than a dozen employees for most of the time, so there was never any great need to automate this work.

Configuring Linux Servers

During the day to day work I would pair up with a colleague when he needed to update some of our critical infrastructure software for the systems that we hosted on our servers.  Sendmail and BIND seemed to need updates relatively often - probably every month or every other month.  Back then we had a few customisations in our installation so an upgrade involved downloading the latest source code and building it from configure scripts and makefiles with specific options and configuration files applied.

Later on I needed to apply the same experience to build up a Postgresql server with the PostGIS extension compiled in for use on a property auction website - enabling the site to present out an up to date visual representation of the sold / available status of each property on the map.  The build of the database server was a bit last minute as we realised that a flat file representation of the data wasn't going to work properly on Linux (and I already had my reservations about that approach being suitable for multiple concurrent updates).

Another significant pieve of work involved upgrading and migrating our internally hosted IMAP email server to a more powerful server, so I went ahead and applied some locking down of access rights to sensitive company folders at around the same time - not because anyone in the company wasn't trustworthy, but just as a general best practice for responsible business data handling.  There were a few gotchas when setting up new email folder permissions after that, but the data migration and version upgrade went seamlessly well.

Configuring Windows Servers

Historically we had hosted several client websites using Microsoft's Internet Information Services - which acted as the runtime environment for Active Server Pages (commonly referred to as "ASP pages").  I took care of setting up IIS and locking down various optional features based on security recommendations.

I used to regularly monitor slashdot, securityfocus.com and various other sites to keep up to date with the latest issues and potential issues to be addressed as proactively as possible.  I also routinely monitored the content being uploaded by our clients, such as ensuring that they weren't slipping into outdated practices for making database credentials available for ASP pages that happened to involve database queries.

Database Access Rights

To minimise the potential risk of having compromised database credentials exploited, all of our databases had table level permissions locked down to the absolute minimum rights required by each client application.  We didn't quite split responsibilities up to have reads handled separately from deletes or writes back then as we were still developing monoliths - which was fine for the scale that we were operating at back then.

Continuous Integration - GoCD plugin

While I was working at Springer (later known as Springer Nature) I gained some hands on experience with two technologies that worked well together: Thoughtwork's GoCD, and Pivotal's Cloud Foundry.

When a competition was announced by Thoughtworks to bolster up the range of plugins available for GoCD, I decided to have a go at producing some open source software that would enable teams that used the combination of GoCD and Cloud Foundry to gain a small additional capability for automating the detection of updates in Cloud Foundry.

My plugin won first prize in the competition, but soon afterwards the somewhat clunky GoCD plugins API was given a significant overhaul meaning that my plugin would not work in later versions of GoCD.

Microservices - Infrastructure As Code

Deploying Microservices

Microservices often need some supporting infrastructure beyond their unit of deployment, such as an object store (e.g. S3) or a database or a persistent shared cacheAll of the major cloud providers provide these types of services and have interfaces to allow us to create them programmatically.  My more recent operations experience has been strongly oriented towards automation of provisioning and configuring such infrastructure components.

The more mature organisations that I have worked in have had systems such as Ansible and Terraform in place to allow developers to specify the expectations of the available supporting services and have any updates be provisioned and applied automatically, sometimes as part of the release process for a new version of a service.

Feature Toggles, Not Feature Branches

When it comes to significant changes to infrastructure as part of a release I've found that it is better to aim for a "roll forward" strategy in the event of any unexpected side-effects rather than expecting to be able to roll back.  This can involve something as simple as toggling the new feature off to give the team enough time to properly diagnose what has gone wrong.  The alternative might involve removal of the newly created infrastructure, which could hide the issue and delay resolution.

Where To From Here?

At this point in time (October 2021) I'm at a potential fork in the road careerwise, I could either continue to be a developer of services for end users and service integrations, or switch over to join a team more focused on enabling developers - sometimes refered to as "Platform engineering" - or I could move even further towards the metal and get involved in developing the tools and services that underpin platforms.


Wednesday 22 September 2021

Sounie is back in New Zealand...

I'm just getting set up with my new ".nz" domain name.

For now my other blogs are still set up with my .london domain:

General technology blog

Microservices blog