Skip to main content

Upserts with version awareness in a relational database - Part One: PostgreSQL

Context

A while back I was working on a product that utilised Elasticsearch as a data store. The use of external version enabled us to have a particularly useful guarantee that a write involving a lower version would never overwrite a persisted record that had already seen a higher version nunmber. So, in the event of multiple updates coming through that related to the same id, our data store would ignore out of date representations of the record's state.

For various reasons it would be interesting to move away from Elasticsearch as the data store, so in this post I am going to share my experiences with implementing version aware upserts with relational databases.

How can we get version safe writes in a relational database?

SQL offers insert and update as the well established, standard data manipulation statements.

Insert will creeate a row if one does not exist already.

Update will update row(s) if they already exist, but will not create any new row. 

Different database vendors offer different options for appropriately applying insert and update in a single operation. In this post I will share what I have found as a way of making this work with PostgreSQL. 

PostgreSQL

As our starting point, we can set up a table that has three attributes, that can be created as follows: 

CREATE TABLE event (id UUID PRIMARY KEY, 
name varchar(255) NOT NULL,
version bigint NOT NULL DEFAULT 0)

It is not a very realistic representation, as name isn't something that should typically be expected to change, so don't expect to focus on any real world business purpose from this.

The following expression represents a JDBC statement that will attempt to insert data, but in the event of conflict update the existing record - only if the existing record's version is lower than the version being supplied. (JDBC is a Java API for interacting with databases from Java).

Each '?' character represents a placeholder for a value to be passed in, so in this situation we would be passing in the following:

id, name, version, name, version

For our simple table that specifies the id column as primary key, the database will reject attempts to insert when a record already exists having the same id.
INSERT INTO event (id, name, version) VALUES (?, ?, ?) ON CONFLICT (id) DO UPDATE SET name = ?, version = ? " +
WHERE event.version < ?

Having the logic applied in a single statement that is sent to the database for processing should be more performant than splitting the logic up in an application where exception handling could attempt to achieve the same functionality, like the following pseudocode:

try {

  // No version awareness required, as will fail on duplicate id 

  runInsert(id, name, version);

} catch (SQLException exception)

{

  if (isDuplicateKey(exception)) { 

  // Within runUpdate, would have version check 

    runUpdate(id, name, version); 

  } else {

   // Do some other exception handling here 

  } 

}

This application code would involve some network latency between the insert and update calls. This would probably only show up as problematic in situations when multiple updates are arriving in at around the same time, leading to a race condition that is sensitive to the accurate current representation of the version value - which leads us to considering transactions and isolation levels.

Transaction isolation levels

Whenever there is a possibility of a race condition we need to have a firm understanding of how the database will behave when multiple tranactions could be running at the same time and referring to the same row(s).

In this situation we would particularly need to consider whether the operations that are happening in parallel could have an out of date view of the current state as a context switch is occuring.

I tried to set up some test code that could be expected to reveal when a race condition picked up an uncommitted value by having a transaction left open for an artificially longer period of time before having it rolled back. After failing to see the expected issue occur I went back to read some documentation and found that PostgreSQL doesn't actually allow the possibility of "read uncommitted" transaction isolation level. 

"In PostgreSQL READ UNCOMMITTED is treated as READ COMMITTED."  

Transaction levels documentation 

So, we cannot trigger a situation where a transaction could read a value that a non-committed transaction has set.

In part two I will share how MariaDB offers a different approach to achieving the equivalent version aware upsert functionality.

Comments

Popular posts from this blog

Speeding up Software Builds for Continuous Integration

Downloading the Internet Can you remember the last time you started out on a clean development environment and ran the build of some software using Maven or Gradle for dependency management? It takes ages to download all of the necessary third party libraries from one or more remote repositories, leading to expression like, "Just waiting for Maven to download the Internet". Once your development environment has been used for building a few projects the range of dependencies that will need to be downloaded for other builds reduces down as the previously referenced onces will now be cached and found locally on your computer's hard drive. What happens on the Continuous Integration environment? Now consider what goes on when Jenkins or your other preferred Continuous Integration server comes to build your software. If it doesn't have a local copy of the libraries that have been referenced then it is going to pay the cost of that slow " download the Internet" p...

2022 - A year in review

Just a look back over the last 12 months. January I moved back to Christchurch to live, after having spent a few months further south since moving back from London. Work was mainly around balancing other peoples' understanding and expectations around our use of Kafka. February I decided that it would be worthwhile to have a year's subscription for streaming Sky Sports, as some rugby matches that I would want to watch would be on at time when venues wouldn't be open. Having moved to Christchurch to be close to an office, now found myself working from home as Covid restrictions came back into effect across New Zealand. March Got back into some actual coding at work - as opposed to mainly reviewing pull requests for configuration changes for Kafka topics.  This became urgent, as the command line interface tool that our provisioning system was dependent on had been marked for deprecation. April   Had my first direct experience with Covid-19.  I only went for a test because ...

Designing systems - The "ity"s That Limit or Enable Profitability

Introduction This started off as a little aide-mémoire to get my head into the right space for preparing for an interview. It's not an exhaustive list, and twists terminology that has been used to represent other things (see:  to Velocity), so don't treat it as a text book reference to work from. Most of the listed points can be associated back to so called "non-functional requirements" - NFRs. I don't like that particular terminology, so alternatively we might consider them as dimensions of the quality of the sytem. Usability "If you build it, they will come" should come with a provisor, "... but if it's awkward to use they'll soon go away, and might not come back." Security All of the aspects that combine to protect data from being seen or manipulated by anyone other than the intended recipient or sender, and also assuring users that the data has originated from the intended source. Velocity Here I'm cheating a bit by trying t...