Context
A while back I was working on a product that utilised Elasticsearch as a data store. The use of external version enabled us to have a particularly useful guarantee that a write involving a lower version would never overwrite a persisted record that had already seen a higher version nunmber. So, in the event of multiple updates coming through that related to the same id, our data store would ignore out of date representations of the record's state.
For various reasons it would be interesting to move away from Elasticsearch as the data store, so in this post I am going to share my experiences with implementing version aware upserts with relational databases.
How can we get version safe writes in a relational database?
SQL offers insert and update as the well established, standard data manipulation statements.
Insert will creeate a row if one does not exist already.
Update will update row(s) if they already exist, but will not create any new row.
Different database vendors offer different options for appropriately applying insert and update in a single operation. In this post I will share what I have found as a way of making this work with PostgreSQL.
PostgreSQL
As our starting point, we can set up a table that has three attributes, that can be created as follows:
CREATE TABLE event (id UUID PRIMARY KEY,
name varchar(255) NOT NULL,
version bigint NOT NULL DEFAULT 0)
It is not a very realistic representation, as name isn't something that should typically be expected to change, so don't expect to focus on any real world business purpose from this.
The following expression represents a JDBC statement that will attempt to insert data, but in the event of conflict update the existing record - only if the existing record's version is lower than the version being supplied. (JDBC is a Java API for interacting with databases from Java).
Each '?' character represents a placeholder for a value to be passed in, so in this situation we would be passing in the following:
id, name, version, name, version
For our simple table that specifies the id column as primary key, the database will reject attempts to insert when a record already exists having the same id.INSERT INTO event (id, name, version) VALUES (?, ?, ?) ON CONFLICT (id) DO UPDATE SET name = ?, version = ? " +
WHERE event.version < ?
Having the logic applied in a single statement that is sent to the database for processing should be more performant than splitting the logic up in an application where exception handling could attempt to achieve the same functionality, like the following pseudocode:
try {
// No version awareness required, as will fail on duplicate id
runInsert(id, name, version);
} catch (SQLException exception)
{
if (isDuplicateKey(exception)) {
// Within runUpdate, would have version check
runUpdate(id, name, version);
} else {
// Do some other exception handling here
}
}
This application code would involve some network latency between the insert and update calls. This would probably only show up as problematic in situations when multiple updates are arriving in at around the same time, leading to a race condition that is sensitive to the accurate current representation of the version value - which leads us to considering transactions and isolation levels.
Transaction isolation levels
Whenever there is a possibility of a race condition we need to have a firm understanding of how the database will behave when multiple tranactions could be running at the same time and referring to the same row(s).
In this situation we would particularly need to consider whether the operations that are happening in parallel could have an out of date view of the current state as a context switch is occuring.
I tried to set up some test code that could be expected to reveal when a race condition picked up an uncommitted value by having a transaction left open for an artificially longer period of time before having it rolled back. After failing to see the expected issue occur I went back to read some documentation and found that PostgreSQL doesn't actually allow the possibility of "read uncommitted" transaction isolation level.
"In PostgreSQL READ UNCOMMITTED is treated as READ COMMITTED."
Transaction levels documentation
So, we cannot trigger a situation where a transaction could read a value that a non-committed transaction has set.
In part two I will share how MariaDB offers a different approach to achieving the equivalent version aware upsert functionality.
No comments:
Post a Comment