Skip to main content

Upserts with version awareness in DynamoDB

Introduction

ElasticSearch has a concurrency control capability that enables writes to use a version value to determine whether to apply an update or discard it as being stale in comparison to some existing data.

As part of considering a migration away from ElasticSearch as a data store, I was interested in how other databases could be made to achieve the same type of version aware upsert capability.

In some earlier posts on this blog I have shared how the version aware upserting can be done with PostgreSQL, MariaDB and TitanDB.

This post is to share how the same capability can be achieved with AWS's DynamoDB document database, as an example of a non-relational database.

What does the DynamoDB API offer?

Insert, or update 

DynamoDB has putItem for creating an item in a DynamoDB table, and updateItem for updating an existing item.

On first look, we might expect some combination of putItem and updateItem to need to be applied, as that would resemble how the relational databases had to detect conflict and fall back to attempt the second type of operation.

It turns out that we can just use putItem, as the API documentation states:

"If an item with the same key already exists in the table, it is replaced with the new item."

So, that takes care of the insert otherwise update aspect of the implementation, but what about version awareness?

Conditional updating writing 

The putItem API offers us the option of specifying some conditional logic that includes the ability to compare existing data against the data being sent.

As this is not involving the updateItem functionality, I decided that it was inappropriate to have "updating" in the heading. Here we are either creating a new item or overwriting an existing one, whereas the updateItem call would be for updating an existing item. 

If we have a table called event, containing an id and a version then we can have a call like the following:

String idAsString = "event-123";
String version = "456";

Map<String, AttributeValue> eventDataUpdating = Map.of(
"id", AttributeValue.builder().s(idAsString).build(),
"version", AttributeValue.builder().n(version).build());

dbClient.putItem(
PutItemRequest.builder().tableName("event")
.item(eventDataUpdating)
.conditionExpression("attribute_not_exists(id) OR (version < :version)")
.expressionAttributeValues(Map.of(":version", AttributeValue.builder().n(version).build()))
.returnValuesOnConditionCheckFailure(ReturnValuesOnConditionCheckFailure.ALL_OLD)
.build());

In that code, conditionExpression and expressionAttributeValues combine to express the two situations that determine whether the content should be written into the event table:

  • attribute_not_exists covers the insert case, as there is no existing record with the specified id;
  • version < :version covers the situation where an existing item exists but has a lower version value than what is being provided now.

Try it out

I've been experimenting in Java, using the Localstack Docker container as a standalone environment for interacting with DynamoDB, so you can grab the code and try running it for yourself.

At the time of this post, it is just a single class that:

  • creates the table
  • writes an initial low version
  • sets up 100 randomly ordered version values
  • spins up virtual threads that each pick up one of the version values and concurrently attempt to apply the update using the condition check
  • prints out when a conflict has prevented an attempted update (as expected) 
  • verifies that when the dust has settled we ultimately end up with the highest version being written

Code in GitHub  

(You'll need Java 21 or later, Maven, and Docker). 

Disclaimer

So far I have only scratched the surface of how to achieve the desired functionality.

I would not recommend applying this approach without also diving deep into the documentation for further layers of potential limitations and situations where eventual consistency may make this less appropriate than it appears.

Follow-on Curiosity

Is this how global tables address conflict?

I wonder if DynamoDB global tables apply similar logic when writes to the same item occur concurrently in different regions.

From the documentation about multi-region strong consistency, "Conditional write operations always evaluate the condition expression against the latest version of an item. Updates always operate against the latest version of an item."

I suppose the ReplicatedWriteConflictException could simply be a mapping from when ConditionalCheckFailedException is encountered.

Comments

Popular posts from this blog

Having a go at learning some Kotlin

What's this about?  The year 2025 is almost over, so that means that it has been a bit over a decade since my old colleague Filippo gave a presentation to the development team of ScienceDirect covering the merits of the Kotlin programming language. So, it's about time that I had a proper go at using it. This blog post is intended to trace what the experience has been like, covering surprises that I encounter along the way. Getting started The programming language that I am most experienced with is Java, so I have chosen to try out implementing some functionality in Kotlin from a recent hobby project that I developed in Java involving spinning up a database in a Docker container and running some queries. JVM version support IntelliJ IDEA includes some automation for creating a new project, so I selected the relevant options to use the latest LTS version of the Java virtual machine with Spring Boot, Kotlin, Postgresql and Test containers. After a few seconds I had a new project i...

The Importance of Segmenting Infrastructure

Kafka for Logging I was recently poking around in the source code of a few technologies that I have been using for a few years when I came across KafkaLog4jAppender. It enables you to use Kafka as a place to capture application logs. The thing that caught my eye was the latest commit associated with that particular class, "KafkaLog4jAppender deadlocks when idempotence is enabled" . In the context of Kafka, idempotence is intended to enable the system to avoid producing duplicate records when a producer may need to retry sending events due to some - hopefully - intermittent connectivity problem between the producer and the receiving broker. The unfortunate situation that arises here is that the Kafka client code itself uses Log4j, so it can result in the application being blocked from sending its logs via a Kafka topic because the Kafka client Producer gets deadlocked waiting on transaction state. Kafka For Metrics - But Not For Kafka Metrics This reminded me of a similar scen...

2022 - A year in review

Just a look back over the last 12 months. January I moved back to Christchurch to live, after having spent a few months further south since moving back from London. Work was mainly around balancing other peoples' understanding and expectations around our use of Kafka. February I decided that it would be worthwhile to have a year's subscription for streaming Sky Sports, as some rugby matches that I would want to watch would be on at time when venues wouldn't be open. Having moved to Christchurch to be close to an office, now found myself working from home as Covid restrictions came back into effect across New Zealand. March Got back into some actual coding at work - as opposed to mainly reviewing pull requests for configuration changes for Kafka topics.  This became urgent, as the command line interface tool that our provisioning system was dependent on had been marked for deprecation. April   Had my first direct experience with Covid-19.  I only went for a test because ...