Friday, 30 May 2025

Severance and FedRAMP - keep them separate

Not long before starting my career break to help out with family, I had a social catch up with some work colleagues via Zoom and the conversation looped into what TV shows people have been watching lately... (do we still refer to things as TV?). Severance got a mention, so a few days later I got drawn into starting an Apple TV subscription to find out what it was all about.

This post is going to be short and sweet, that may limit it to an inside joke or "Were you thinking what I was thinking?" as I don't want to offer up any spoilers about the Severance, or FedRAMP.

I just hope nobody takes inspiration for life to immitate art, by having the Severance approach to work/life balance be applied to meet a level of FedRAMP isolation.

Thursday, 15 May 2025

Applying AI to software development can be like following SatNav

Trying out a different navigation system

A month or so ago I upgraded to a car that has a SatNav system included, so I have been trying to use that instead of the Maps app on my phone. My experiences with it so far have generally been good, but it is far from flawless - a bit like Artificial Intelligence (AI) in software development.

As context, my previous vehicle was not too old to include SatNav, it just hadn't been set up with English language or New Zealand maps - one of the down sides of having a second hand vehicle that originated in Japan.

Flawed or incomplete information

Driving around central Christchurch can be a bit challenging at times as various roadworks are underway, leaving streets closed off or narrowed down to a single lane. It could be reasonable to expect that a basic navigation system might not have up to the minute awareness of those closures and restrictions. However, something that I did not expect to encounter was the navigation system advising me to expect a roundabout where no roundabout exists. This hasn't just been a one off anomaly, so I am having to glance down at the map from time to time to understand whether I should be preparing to turn at these traffic lights or further along.

As I become more accustomed to the prompts coming from the SatNav, I expect that I will be more aligned with the time and distance ahead that the guidance is expected to be related to.

Routing with unclear preferences

I'm not sure why, but on a recent long distance drive the guidance system wanted me to turn off the main highway to go inland to some unfamiliar roads. Normally I would be curious enough to take a little detour and see a different part of the countryside, but on this occasion I had a kitten as a passenger in the back seat so I wanted to stay on the familiar and direct route.

There may have been some marginal difference of total road distance, but I don't know if the inland roads would have been of the same sealed quality as State Highway 1. The only time that I have needed to detour inland around that part of the country before involved gravel roads, where my comfortable driving speed is much lower.

Lower involvement results in lower recall

If I am not actively engaged in the navigation process then I am less likely to remember details of the route for future journeys. If I didn't make an active decision to turn off at a particular place, I will be less likely to absorb that information and have it available for use in the future.

Generating code with AI is like using SatNav

When it comes to software development, I believe that we should be treating artificial intelligence systems with a suitable amount of caution and awareness of the potential limitations and flaws.

I have used AI to generate small one-off utility apps, as well as to produce snippets of code for production systems that process millions of data records every day. Just like we have checks and balances in place to test and review code produced by human developers, I would not allow fully AI generated code to drive my production systems...  ...yet.

Between the chair and the keyboard

The person driving the AI still needs to be around for awareness so that what is being produced will be fit for purpose - meeting the "ity" considerations, such as:

  • functionality 
  • scalability
  • security
  • stability
  • durability
  • flexibility
  • ...

Ethics, and compliance with laws and standards are also aspect that will continue to require people being involved to be held accountable for.

 

Tuesday, 29 April 2025

Restricting concurrent updates

Introduction

Just jotting down some thoughts about what might be involved for addressing an issue being faced in my last project.

The problematic situation

Multiple sources of updates arriving in concurrently, being picked up for inclusion in an aggregated representation of the data.
There are multiple worker processes, each with multiple worker threads that pick up changes and apply them without any awareness or consideration of what other work is underway.

Potential solution approaches

Debouncing of updates

Using Redis as a store for keeping track of the identifier of records that are currently being processed has been successfully applied for reducing race conditions for updates of some attributes, so the pattern could be applied more broadly.
The debouncing approach can be thought of as a type of transaction lock restricting access to the record that is required to contain the full representation of state.

Partitioning workload processing

This would most likely involve needing to switch to a different message processing technology to enable isolation of workers and having a single update per identifier at a time.
If Kafka was applied here, we would need to have more awareness and consideration for balancing of keys across partitions to ensure that we preserve scalability of throughput.
To benefit from the change the processing would probably also need to switch to being a single thread per partition to achieve the goal of eliminating concurrent updating of records that have the same identifier.
In my opinion, this would be more trouble than it is worth.

Monday, 7 April 2025

A self-imposed career break to spend more time with family

It's 2025 and I am back to blogging on my personal site, so what has been happening?

For the past few years I have been posting some of my ponderings on another site - a blog that is internal to Atlassian, where I have been working as a senior developer.

That brings me around to why I am going to resume posting here. I have decided to dedicate my time to helping out with a senior family member, stepping completely away from work commitments for a while.

Personal progress

On my first fresh non-work day I have found myself unblocked from making progress on something that I have been comtemplating for a couple of months - I have purchased a more modern car. A 2020 hybrid seems like a nice step up from a 2007 regular petrol car.

Family health progress

Without going into any detail, the family member who had been facing some challenging medical needs has made a remarkable come-back. We are taking things one day at a time.

Things to do

Don't expect much technical output from me for a while, as there is still a need for me to help out with gardening and household maintenance.

Not having any slack notifications for a month or three seems quite appealing.

Tuesday, 27 December 2022

2022 - A year in review

Just a look back over the last 12 months.

January

I moved back to Christchurch to live, after having spent a few months further south since moving back from London.

Work was mainly around balancing other peoples' understanding and expectations around our use of Kafka.

February

I decided that it would be worthwhile to have a year's subscription for streaming Sky Sports, as some rugby matches that I would want to watch would be on at time when venues wouldn't be open.

Having moved to Christchurch to be close to an office, now found myself working from home as Covid restrictions came back into effect across New Zealand.

March

Got back into some actual coding at work - as opposed to mainly reviewing pull requests for configuration changes for Kafka topics.  This became urgent, as the command line interface tool that our provisioning system was dependent on had been marked for deprecation.

April 

Had my first direct experience with Covid-19.  I only went for a test because a friend had mentioned that a runny nose was his first symptom.

May

Managed to roll my ankle as I was leaving the house for the evening.  I thought about going back inside and resting it up with ice etc. - but then decided to try walking off the pain and carried on to a pub quiz.  My team won the pub quiz, and I got an Uber home.

A couple of days later my ankle swelled up so much that it was too painful to walk on.  This lasted a few weeks.

June

Heard from recruiter that was working as a local sourcer for Atlassian, now that they are fully open to remote working.

Had a family member come to visit for a few days.  On the second day they seemed a bit ill so I ordered some more Covid tests - I was okay, but they tested positive and needed to isolate for a week or so. 

July

A few stages of interviews with Atlassian.

Went down south for a weekend, including watching The All Blacks versus Ireland at Forsyth Barr Stadium in Dunedin.

Attended a comedy show by Rhys Darby - he's the comedian / actor who palyed the character Murray from the Flight of The Conchords television series.

August

Final stage of interviews with Atlassian.

Received and accepted offer to join Atlassian as a Senior Software Engineer.

New laptop arrived - glad it was a Mac, as I wasn't sure whether that had been something that I had asked about during the interview process.

September

Properly started the new job.

Purchased a decent office chair, and standing desk - all within budget for being expensed back.  A welcome improvement over sitting at the kitchen table.

October

More learning about Atlassian systems, and familiarising myself with the services that my team is responsible for.

November

Learning about another existing service that will be moving across to my team for further development and maintenance.

December

Went to Sydney to meet up with the rest of my work teammates - several of whom also had to travel across from other parts of Australia.

Enjoyed my first experience of an escape room, which was the team building exercise that we chose.

Went to Otautahi Smoke - an afternoon and evening of live music, BBQ food and beers in Hagley Park.

Wednesday, 13 July 2022

Designing systems - The "ity"s That Limit or Enable Profitability

Introduction

This started off as a little aide-mémoire to get my head into the right space for preparing for an interview. It's not an exhaustive list, and twists terminology that has been used to represent other things (see:  to Velocity), so don't treat it as a text book reference to work from.

Most of the listed points can be associated back to so called "non-functional requirements" - NFRs. I don't like that particular terminology, so alternatively we might consider them as dimensions of the quality of the sytem.

Usability

"If you build it, they will come" should come with a provisor, "... but if it's awkward to use they'll soon go away, and might not come back."

Security

All of the aspects that combine to protect data from being seen or manipulated by anyone other than the intended recipient or sender, and also assuring users that the data has originated from the intended source.

Velocity

Here I'm cheating a bit by trying to come up with a term to represent the speed at which a system can respond to user input - not development velocity.

Accessibility

This has multiple dimensions to it, ranging from the devices that can present the interface, to the affordances offered for people with disabilities.

Reliability

The system does what it is intended and expected to do, in a consistent manner over a significant period of time. 

Elasticity / Scalability / Capacity

How well the system can cope when it becomes popular enough to attract a lot of users.

Likewise, how well it can scale back down to a level that is suitable when there is less demand - and less need for potentially expensive resources to be available.

Adaptability / Flexibility

It's not necessarily always the case, but given a range of possible technologies to choose from, they will often have an associated time or money cost for applying changes.

Not all roads lead to Profitability

In a commercial product, these are intended to combine to lead to profitability.

Not all products will consider these as being equally high priority, so you may find it a valuable exercise to get your team together and agree on a relative ranking so that you can focus on what is important for your business to succeed with the challenged and opportunities in the current environment.

Visualise Priority Ranking

I'd even go so far as to suggest having a visual representation of the value rankings so that there can be little doubt what to prioritise when making changes - in the days of office working this might be something like a poster on the wall or an A4 printout in the top corner of the whiteboard where the team has their stand-up meetings.


Sunday, 3 July 2022

Running Java with Preview Features in the Cloud - Part One

Introduction

I've been catching up on some features that have been added in recent versions of Java. The 6 month release cadence of new versions of Java is great, but can lead to a build up of new things to learn about.

The support for pattern matching in switch statements - JEP 406 - is particularly appealing, but for now it is still only available as a preview feature, meaning that at compile time and at run time we need to explicitly specify to enable preview.

A shallow view of the main cloud providers

A lot of online applications these days will run in some sort of cloud runtime environment.  Some examples from the main cloud providers are:

According to what the documentation currently specifies, AWS Lambda's pre-packaged Java environments only support versions 8 and 11 unless you bring your own Docker container. Similarly, Azure Functions only offer versions 8 and 11. This leaves us to consider Google Cloud Functions which supports and recommends Java 17.

What can we try out?

As far as I can tell, the Google Cloud Function way of running Java doesn't allow us to control command line arguments to the Java runtime, so we cannot simply specify --enable-preview that way.

This leaves us to try out customizing AWS Lambda to:

  • set up a Docker container including the Java 17 runtime
  • set up a wrapper script to pass --enable-preview as a command line paramater to make the lambda initialize with the functionality that we want.

Tuesday, 21 June 2022

Speeding up Software Builds for Continuous Integration

Downloading the Internet

Can you remember the last time you started out on a clean development environment and ran the build of some software using Maven or Gradle for dependency management? It takes ages to download all of the necessary third party libraries from one or more remote repositories, leading to expression like, "Just waiting for Maven to download the Internet".

Once your development environment has been used for building a few projects the range of dependencies that will need to be downloaded for other builds reduces down as the previously referenced onces will now be cached and found locally on your computer's hard drive.

What happens on the Continuous Integration environment?

Now consider what goes on when Jenkins or your other preferred Continuous Integration server comes to build your software. If it doesn't have a local copy of the libraries that have been referenced then it is going to pay the cost of that slow "download the Internet" process every single time that it comes to check out your latest changes and run a build.

What are the main costs involved here?

  • Developer time waiting on the build to complete before moving on to the next change
  • Data transfer charges for sourcing from external repositories

Cutting down costs - saving time

What options do we have available for reducing these costs?

  1. Localise the artifact repository, acting as a pass-through cache
  2. Or Pre-download the most common artifacts in a build container image

Option 1 would involve the selection and setup of an appropriate artifact repository manager such as Nexus or Artifactory. There's a reasonable chance that if your organisation happens to write your own reusable libraries then this will be already be in place for supporting the distribution of those artifacts anyway, so it may just be a matter of re-configuring the setup to support mirroring of external third party libraries sources from external repositories.

Option 2 may seem a bit counter-intuitive as it would go against the current trend of trying to minimise container sizes and to be generally useful it would need to contain a broader range of artifacts than any one project's build would require.

Keep it local

For both options the performance improvement comes down to locality of reference. The builds should be able to obtain most, if not all, dependencies without having to go beyond the organisation's private build environment's network - whether that be a Virtual Private Cloud or a data centre.

With this type of setup in place builds should be able to spend less time on initial setup, and be more focussed on compilation, running tests, and ultimately making the new known good version of the code available for use.

If you want to understand the potential time savings on offer here, just try temporarily moving the content of your local development environment's build cache away and see how long a build takes. For a typical Java microservice I would not be at all surprised if the build time doubles or even triples for having to obtain the build plugin libraries, the application's direct dependencies, and all of the transitive dependencies.

Monday, 20 June 2022

Docker SBOM - Software Bill Of Materials

In an earlier post on this blog I was curious about comparing Docker images to try to track down the differences that might be causing performance problems. Since then I have had a play with the sbom Docker command for listing out what is included in the image.

Following the documentation at: https://docs.docker.com/engine/sbom/

Below is an example of the output of a run of a locally built app:

> docker sbom hello-world-alpine-jlink:latest

 

Syft v0.43.0
 ✔ Loaded image            
 ✔ Parsed image            
 ✔ Cataloged packages      [16 packages]

NAME                    VERSION       TYPE         
alpine-baselayout       3.2.0-r20     apk           
alpine-baselayout-data  3.2.0-r20     apk           
alpine-keys             2.4-r1        apk           
apk-tools               2.12.9-r3     apk           
busybox                 1.35.0-r13    apk           
ca-certificates-bundle  20211220-r0   apk           
docker-comparison       1.0-SNAPSHOT  java-archive  
jrt-fs                  11.0.15       java-archive  
libc-utils              0.7.2-r3      apk           
libcrypto1.1            1.1.1o-r0     apk           
libssl1.1               1.1.1o-r0     apk           
musl                    1.2.3-r0      apk           
musl-utils              1.2.3-r0      apk           
scanelf                 1.3.4-r0      apk           
ssl_client              1.35.0-r13    apk           
zlib                    1.2.12-r1     apk   

 

This is a much more detailed listing of the components that are included in the docker image than we would get from looking at the Dockerfile or image history, so I would recommend it as a way of checking what you are including in an image. The main feature request that I have is to separate the artifacts by type, though in this trivial example that is simple enough to do by just looking at the listing.


Tuesday, 14 June 2022

The Importance of Segmenting Infrastructure

Kafka for Logging

I was recently poking around in the source code of a few technologies that I have been using for a few years when I came across KafkaLog4jAppender. It enables you to use Kafka as a place to capture application logs. The thing that caught my eye was the latest commit associated with that particular class, "KafkaLog4jAppender deadlocks when idempotence is enabled".

In the context of Kafka, idempotence is intended to enable the system to avoid producing duplicate records when a producer may need to retry sending events due to some - hopefully - intermittent connectivity problem between the producer and the receiving broker.

The unfortunate situation that arises here is that the Kafka client code itself uses Log4j, so it can result in the application being blocked from sending its logs via a Kafka topic because the Kafka client Producer gets deadlocked waiting on transaction state.

Kafka For Metrics - But Not For Kafka Metrics

This reminded me of a similar scenario where an organisation might choose to use Kafka as their mechanism for sending out notifications of metrics for their microservices and associated infrastructure. If Kafka happens to be part of the infrastructure that you are interested in being able to monitor, then you need to keep those resources isolated from the metrics Kafka - otherwise you run the risk of an incident impacting Kafka which prevents the metrics from being transmitted.

Keeping Things Separated

A real world example of keeping infrastructure isolated from itself can be seen in the way Confluent Cloud handles audit logs. I found it a little confusing at first, as the organisation that I was working for at the time only had Kafka clusters in a single region, but the audit logs were on completely separate infrastructure in another region and even another cloud provider.

Sometimes You're Using A Service Indirectly

A slightly different - but no less significant - example of the need for isolating resources can arise when a particular type of infrastructure is being used for different types of workload. Rather than having a "big bang" release of changes to all of the systems, a phased rollout approach can be taken. One of my earliest involvements with using AWS came shortly after their 2015 DynamoDB outage, which had a ripple out impact for a range of other AWS services because behind the scenes those other services were themselves utilising DynamoDB.

It's my understanding that AWS subsequently moved to isolating their internal services' DynamoDB resource from general consumers' DynamoDB infrastructure - but don't quote me on that.

Friday, 10 June 2022

Docker Images - Size matters, But So Does Performance

Introduction

I recently went through the exercise of re-building a Docker image based on what was supposed to be a stable, well-known application codebase. Along the way I observed an unexpected performance issue.

The application contained within the Docker image was just a Java command line utility for parsing some yaml files to provision kafka resources on our hosted development clusters. The code had not been changed for several months, so this was supposed to just be a matter of setting up a local copy of the Docker image instead of pulling down a trusted third party's image from Dockerhub.

The application was bundled within a Docker contrainer whose Dockerfile was alongside the code, so it should have been a simple matter of using that to produce the image and pushing it to our own repo, and then pulling that down for our runtime use.

It's the same, so why's it different?

We had been running with the existing third party Docker image for several months, so there was a well established history of how long each stage of the deployment pipeline should typically take to run.

When the new Docker image ran it took noticeably longer to complete each stage. I don't have the exact figures in front of me, but can recall that it was in the order of double digit percentage of time slower - so a six minute build might now be taking longer than seven minutes.

Examining the Docker images

The third party's build process for the original Docker image wasn't available for examination, so to compare the Docker images we need to use something like

> docker history --no-trunc <full image name>

From this I was quickly able to establish that there were a couple of significant differences between the application's specified Dockerfile and the Dockerfile that would have been used for building the faster running established version:

  • The base image
    • CentOS Linux versus Alpine Linux
       
  • The Java runtime
    • Full Java SDK versus jlink with specific modules

Getting back up to speed

Since the purpose of this setup was to be a lift and shift of the existing setup, I adjusted the Dockerfile to involve CentOS Linux as its base image and adjusted it to use a full JDK instead of the clever jlink minimised Java runtime environment.

At this point we were where we wanted to be as our baseline for migrating off the third party Docker image. Our image has the same base OS and Java runtime and performs close enough to the same - without taking the double digit percentage of time longer than our starting point.

What was the issue?

While I was working on this particular setup there was a pressing deadline that I was not free to play around with tuning this setup and isolating whether the issue was due to the OS or the jlink runtime (or something else).

Based on what I have seen mentioned online, I suspect that there may have been some aspect of the application that involved heavy use of system calls that were not set up to run Java efficiently with Alpine's musl library. For now that it just a theory, and not something that I have managed to reproduce on a simplified locally built application.

If the runtime environment had involved inputs from external systems I would have been more motivated to try to keep us on Alpine to minimise the potential vulnerabilities as it tends to have fewer services and libriaries that tend to have CVEs representing potential security vulnerabilities.

 

Monday, 11 April 2022

Expiring CA Certificates - How not to get caught out

I never thought it would happen to me. I was careful, I prepared well in advance, I even had multiple environments to test things out in...

I got caught out by clutter. I had updated the correct file in the development environment, but updated a file with the same name in a slightly different location in production.

A brief check of the system with the new certificate in place seemed fine - the certificate didn't look like it was due to expire on the known expiry date.

That's the problem with an expiring CA certificate - it's not front and centre showing up as something you need to be concerned about. The chain of trust is a bit less visible, you have to click through to see the details.

In the heat of the moment, troubleshooting what might have gone wrong with the setup I even repeated the mistake of copying the file in the wrong location.


Monday, 11 October 2021

My History With Open Source

From CPAN to Design Patterns

Throughout my career I've benefited greatly from being able to utilise open source software that other developers have produced and made freely available.

Some of my earliest commercial project work benefitted from libraries made available for Perl via the Comprehensive Perl Archive Network (CPAN). It sometimes felt like our company had a huge advantage over organisations that used VB Script for developing ASP pages, as they seemed to be tied into the world of closed source and needing to pay to use libraries that other organisations had developed as a licensed product for sale.

In the early two thousands I was continuing my university studies as a part time student while working as a software developer. One of the distributed systems courses gave me some exposure to JBoss and Tomcat, which made me question why we were paying to use commercial application servers for some of our clients' projects in my day job.

Aside from the common day to day helper libraries such as JUnit and log4j, Tomcat was probably the first major open source Java system that I brought into my work, proving out that we didn't need EJBs and all of their standards for getting some data out of a database and onto some web pages. At around the same time we were probably dabbling with Apache JMeter as a mechanism to validate that this new kid on the block (well, our block at least) was going to cope with what we anticipated was going to be thrown at it.

Although we didn't use any particular library for it, I would also consider design patterns as an example of shared knowledge that really helped us to achieve our scalability and performance goals. Safely caching data that was being read frequently, but only being updated only a few times each day. If you went skiing in New Zealand in the early 2000s then I can almost guarantee that you checked snow reports using code that I developed.

Giving Something Back

Open source licenses can be a legal minefield for the owners and the users of products and libraries.

Working in large corporations often involves policies and even clauses in employment contracts - along the lines of "If you develop on company time, or on company hardware then anything produced as a result is the property of the company" and / or "The use of any open source software is expressly forbidden unless it has been formally approved by the XYZ committee".

Even smaller companies need to be aware of the differences between GPL, MIT, Apache and other variations of licenses before building a product up.

So far my main contributions to open source projects have mainly been limited to minor improvements to the documentation, and a couple of minor bug fixes for some smaller projects. Correcting typos and improving grammar can be a small way of helping out - providing that it isn't pedantic or debatable whether the new phrasing is better. So far I have had all of my contributions accepted with gratitude, as the original developers sometimes have English as a second language or just slipped up a little in the rush of getting something out and released.

Personally, I also find that by contributing to explaining how something works I can improve my ability to understand and recall that information later on. So, as well as being a good way to make an initial contribution to an open source community, consider that by improving your understanding you will also be moving some way towards being able to contribute to the code as well.