Skip to main content

Upserts with version awareness in a relational database - Part One: PostgreSQL

Context

A while back I was working on a product that utilised Elasticsearch as a data store. The use of external version enabled us to have a particularly useful guarantee that a write involving a lower version would never overwrite a persisted record that had already seen a higher version nunmber. So, in the event of multiple updates coming through that related to the same id, our data store would ignore out of date representations of the record's state.

For various reasons it would be interesting to move away from Elasticsearch as the data store, so in this post I am going to share my experiences with implementing version aware upserts with relational databases.

How can we get version safe writes in a relational database?

SQL offers insert and update as the well established, standard data manipulation statements.

Insert will creeate a row if one does not exist already.

Update will update row(s) if they already exist, but will not create any new row. 

Different database vendors offer different options for appropriately applying insert and update in a single operation. In this post I will share what I have found as a way of making this work with PostgreSQL. 

PostgreSQL

As our starting point, we can set up a table that has three attributes, that can be created as follows: 

CREATE TABLE event (id UUID PRIMARY KEY, 
name varchar(255) NOT NULL,
version bigint NOT NULL DEFAULT 0)

It is not a very realistic representation, as name isn't something that should typically be expected to change, so don't expect to focus on any real world business purpose from this.

The following expression represents a JDBC statement that will attempt to insert data, but in the event of conflict update the existing record - only if the existing record's version is lower than the version being supplied. (JDBC is a Java API for interacting with databases from Java).

Each '?' character represents a placeholder for a value to be passed in, so in this situation we would be passing in the following:

id, name, version, name, version

For our simple table that specifies the id column as primary key, the database will reject attempts to insert when a record already exists having the same id.
INSERT INTO event (id, name, version) VALUES (?, ?, ?) ON CONFLICT (id) DO UPDATE SET name = ?, version = ? " +
WHERE event.version < ?

Having the logic applied in a single statement that is sent to the database for processing should be more performant than splitting the logic up in an application where exception handling could attempt to achieve the same functionality, like the following pseudocode:

try {

  // No version awareness required, as will fail on duplicate id 

  runInsert(id, name, version);

} catch (SQLException exception)

{

  if (isDuplicateKey(exception)) { 

  // Within runUpdate, would have version check 

    runUpdate(id, name, version); 

  } else {

   // Do some other exception handling here 

  } 

}

This application code would involve some network latency between the insert and update calls. This would probably only show up as problematic in situations when multiple updates are arriving in at around the same time, leading to a race condition that is sensitive to the accurate current representation of the version value - which leads us to considering transactions and isolation levels.

Transaction isolation levels

Whenever there is a possibility of a race condition we need to have a firm understanding of how the database will behave when multiple tranactions could be running at the same time and referring to the same row(s).

In this situation we would particularly need to consider whether the operations that are happening in parallel could have an out of date view of the current state as a context switch is occuring.

I tried to set up some test code that could be expected to reveal when a race condition picked up an uncommitted value by having a transaction left open for an artificially longer period of time before having it rolled back. After failing to see the expected issue occur I went back to read some documentation and found that PostgreSQL doesn't actually allow the possibility of "read uncommitted" transaction isolation level. 

"In PostgreSQL READ UNCOMMITTED is treated as READ COMMITTED."  

Transaction levels documentation 

So, we cannot trigger a situation where a transaction could read a value that a non-committed transaction has set.

In part two I will share how MariaDB offers a different approach to achieving the equivalent version aware upsert functionality.

Comments

Popular posts from this blog

Having a go at learning some Kotlin

What's this about?  The year 2025 is almost over, so that means that it has been a bit over a decade since my old colleague Filippo gave a presentation to the development team of ScienceDirect covering the merits of the Kotlin programming language. So, it's about time that I had a proper go at using it. This blog post is intended to trace what the experience has been like, covering surprises that I encounter along the way. Getting started The programming language that I am most experienced with is Java, so I have chosen to try out implementing some functionality in Kotlin from a recent hobby project that I developed in Java involving spinning up a database in a Docker container and running some queries. JVM version support IntelliJ IDEA includes some automation for creating a new project, so I selected the relevant options to use the latest LTS version of the Java virtual machine with Spring Boot, Kotlin, Postgresql and Test containers. After a few seconds I had a new project i...

The Importance of Segmenting Infrastructure

Kafka for Logging I was recently poking around in the source code of a few technologies that I have been using for a few years when I came across KafkaLog4jAppender. It enables you to use Kafka as a place to capture application logs. The thing that caught my eye was the latest commit associated with that particular class, "KafkaLog4jAppender deadlocks when idempotence is enabled" . In the context of Kafka, idempotence is intended to enable the system to avoid producing duplicate records when a producer may need to retry sending events due to some - hopefully - intermittent connectivity problem between the producer and the receiving broker. The unfortunate situation that arises here is that the Kafka client code itself uses Log4j, so it can result in the application being blocked from sending its logs via a Kafka topic because the Kafka client Producer gets deadlocked waiting on transaction state. Kafka For Metrics - But Not For Kafka Metrics This reminded me of a similar scen...

2022 - A year in review

Just a look back over the last 12 months. January I moved back to Christchurch to live, after having spent a few months further south since moving back from London. Work was mainly around balancing other peoples' understanding and expectations around our use of Kafka. February I decided that it would be worthwhile to have a year's subscription for streaming Sky Sports, as some rugby matches that I would want to watch would be on at time when venues wouldn't be open. Having moved to Christchurch to be close to an office, now found myself working from home as Covid restrictions came back into effect across New Zealand. March Got back into some actual coding at work - as opposed to mainly reviewing pull requests for configuration changes for Kafka topics.  This became urgent, as the command line interface tool that our provisioning system was dependent on had been marked for deprecation. April   Had my first direct experience with Covid-19.  I only went for a test because ...