Thursday, 31 July 2025

Time to make an exception to the LTS version policy?

Virtual Threads Improved 

I'm starting to dip my thoughts back into technology, in preparation for getting back into interviewing and getting back to working.

This week I had a listen to the latest episode of Josh Long's "A bootiful podcast" and was surprised to hear a strong push for going with non long term support (LTS) versions of Java.

A few days later I was watching a presentation about how NetFlix is using Java in 2025 (JavaOne presentation on Youtube), sure enough there Java 24 is also being applied in order to benefit from improvements to the use of virtual threads.

JEP 491 "Synchronize Virtual Threads without Pinning" seems to be enough of a driver for some organisations to push forward outside of the Long Term Support versions.

Non LTS in prod?

I've seen companies apply a simple policy of only permitting LTS versions in production, so now I'm curious about whether the broader Java community is moving away from that approach.

Potential Migration Limitation

For the types of systems that I have worked on in the past, I can see a potential blocker to slipping out of the LTS versions - cloud providers such as AWS offering systems such as AWS lambda are still only supplying LTS versions of the Java runtime in their bundled Java offering. 

So if you have workloads running in Docker or directly on an EC2 instance you wouldn't be able to smoothly transition it across to AWS lambda.

Just a matter of time

Java 25 is scheduled to be the next LTS version, so if you are prepared to wait a few months then sticking with the "LTS only" policy could also work okay.

Saturday, 26 July 2025

Upserts with version awareness in a relational database - Part Three: TiDB

Insert or Update, like MariaDB - but different

One of the earlier posts in this series covered how to implement an upsert with MariaDB or MySQL. TiDB is described as being "highly compatible" with the MySQL protocol, but my simple table definition surfaced up one of the differences.

The table definition that worked for MariaDB was:

CREATE TABLE event (id UUID PRIMARY KEY, 
name varchar(255) NOT NULL,
version bigint NOT NULL DEFAULT 0) 

TiDB does not support UUID as a built in data type, so we have to adapt the setup to involve a less direct representation. Initially I went with a varchar text representation, but then found a suggestion to apply a binary type instead, leaving a table creation statement as follows.

CREATE TABLE event (id BINARY(16) PRIMARY KEY,
name varchar(255) NOT NULL,
version bigint NOT NULL DEFAULT 0)

The corresponding statement for inserting data also involved some adjustment, to enable the value passed in to be converted to the correct type.

INSERT INTO event (id, name, version) VALUES (UUID_TO_BIN(?), ?, ?)
ON DUPLICATE KEY UPDATE
name = IF(version < VALUES(version), VALUES(name), name),
version = IF(version < VALUES(version), VALUES(version), version)

Likewise, when retrieving data back out from the database we can also apply a conversion to simplify the handling as a String.

SELECT BIN_TO_UUID(id) AS uuidId, name, version from event

Different isolation levels

Another difference between TiDB and the MariaDB / MySQL implementations is that TIDB does not support the "read uncommitted" isolation level. If you try to specify it you will trigger an error.

https://docs.pingcap.com/tidb/stable/transaction-isolation-levels/

Show me some code...

The code involved for this exploration can be found in one of my public repositories on GitHub. 

https://github.com/Sounie/tidb-with-jdbc 



 


Thursday, 17 July 2025

Is it time to switch back to access logs for measuring site use / popularity?

What proportion of browsing is invisible?

I still use Google Analytics to see whether my blog posts are being seen, but I am curious about whether access logs would give a more accurate indication - given the prevalence of browsers and browser plugins that actively resist tracking by Javascript etc.

Back in the early 2000s the company that I was working for at the time offered website hosting as a service, where a reasonable amount of the billing was tied to the volume of traffic being served by each site. Back then we didn't have microservices scattered across multiple ephemeral virtual servers, so generating a monthly report was generally a straight-forward matter of having software configured to point at a particular directory and run a scheduled job to parse the contents of access log files.

Since I have recently come to the realisation that the Blogger platform that my blog posts are hosted and served out from is not particularly useful when it comes to being indexed by Google, it may be time to weigh up options for an alternative hosting setup.

To satisify my curiousity access to logs may be on the list of criteria for a new hosting setup. I might even switch to a lightweight static HTTP host and get back to the olden days of directly editing HTML files.

Wednesday, 16 July 2025

My blogger sites' SEO has tanked, now I know why

Google indexes mobile first, while Blogger has an out-dated approach of redirecting for mobile clients

Google Search Console is reporting that the URLs on my blog are not being indexed due to:

 Failed: Redirect error

The crawler involved is Googlebot smartphone, so it seems that it will be tripping over Blogger's old fashioned approach for handling of mobile clients, that basically involves redirecting the client to specify the URL with a querystring parameter appended to it. As far as the indexer is concerned the URL with the parameter won't be a match for the canonical URL so should not be indexed as representative of the content for the canonical URL.

In the past it may have not been much of an issue as the Desktop crawler would eventually come along to index the site and not encounter the redirection, so the URL requested would match with the canonical URL and it would be a simple one to one mapping for safely indexing the content.

Ironically the Blogger platform is owned by Google, so I would have hoped that they would be able to coordinate for their users' content to be included across their platforms. 

What have we learnt?

Google isn't as complete as other search engines 

These days Google is not indexing content that it used to, so it will be a less complete representation than other search engines can offer. I can find my Blog posts by searching on Bing, but not on Google.

Blogger.com hasn't kept up with Google's changes to indexing strategy

There are several site support pages online that suggest that the redirect situation is not a problem, and can simply be ignored. I believe that guidance is out of date, as over time Google has been favoring much greater weight onto indexing of the mobile view of website content.

How could this be resolved?

Responsive design 

The Blogger platform includes some templates that are designed to be responsive, so that when a mobile client is used to view the page it will be presented with a suitable layout. This should mean that there is no longer be any need to involve redirects for mobile clients when blogs specify such a layout.

Thursday, 10 July 2025

Upserts with version awareness in a relational database - Part Two: MariaDB / MySQL

Insert, or Conditional Update 

I started off trying out MariaDB as a comparison to how PostgreSQL supports upserts with version awareness.  MariaDB started off as a fork of MySQL, so it makes sense that the same approach is also applicable to MySQL. 

From part one of this blog post series, we have a simple table definition:

CREATE TABLE event (id UUID PRIMARY KEY, 
name varchar(255) NOT NULL,
version bigint NOT NULL DEFAULT 0)

Unlike PostgreSQL, we don't have:

ON CONFLICT

as a clause to fall back from insert to update.

Instead there is:

ON DUPLICATE

that can be applied as follows (once again, the syntax is JDBC, where each '?' character represents a value that is passed in at runtime):

INSERT INTO event (id, name, version) VALUES (?, ?, ?)
ON DUPLICATE KEY UPDATE
name = (IF(version < VALUES(version), VALUES(name), name)),
version = (IF(version < VALUES(version), VALUES(version), version))

Stylistically it doesn't seem like a scalable approach that I would want to apply for a table that has more attributes, as we have to specify the version clause on each individual property. I would prefer it if we could specify a where clause to guard the update, but that is not available. 

An anomaly in rows affected? 

Normally when we use executeUpdate on a JDBC Statemenbt we receive back a response that is the row count of the number of rows that were added or altered by the statement. In this situation there is different behaviour, as documented on the MariaDB website. Basically an insert is represented by the value 1, and an update is represented by the value 2 (Sidenote: this is something ChatGPT does not seem to be aware of, as it takes a literal approach to the actual number of rows affected).

I thought that the rows affected values would be useful for being able to keep track of the relative frequencies of inserts versus updates, until I noticed something odd when running some tests...

It seems that when the update is only setting the existing values then the rows affected value that will be returned is 1, making that situation indistinguishable from the insert.

Update

Based on this documentation it seems that we have an explanation for why the update can return rows affected as being 1, "existing row is set to its current values, and the CLIENT_FOUND_ROWS is set".

If I change my JDBC Connection setup to specify the non-default value for useAffectedRows then we get a 1 when an insert occurs, a 2 when an update changes the existing record, and a 0 when the update is setting the values to be the same as the existing state.

Properties properties = new Properties();
properties.put("user", DB_USER);
properties.put("password", PASSWORD);

properties.put("useAffectedRows", true);

Connection connection = DriverManager.getConnection(jdbcUrl, properties);

Disclaimer

The approach shared in this post is not something that I am offering up as a recommended or optimal way to get conditional logic applied when upserting into MariaDB or MySQL.

It's more a case of me thinking aloud, because thinking is allowed.  

Tuesday, 8 July 2025

Upserts with version awareness in a relational database - Part One: PostgreSQL

Context

A while back I was working on a product that utilised Elasticsearch as a data store. The use of external version enabled us to have a particularly useful guarantee that a write involving a lower version would never overwrite a persisted record that had already seen a higher version nunmber. So, in the event of multiple updates coming through that related to the same id, our data store would ignore out of date representations of the record's state.

For various reasons it would be interesting to move away from Elasticsearch as the data store, so in this post I am going to share my experiences with implementing version aware upserts with relational databases.

How can we get version safe writes in a relational database?

SQL offers insert and update as the well established, standard data manipulation statements.

Insert will creeate a row if one does not exist already.

Update will update row(s) if they already exist, but will not create any new row. 

Different database vendors offer different options for appropriately applying insert and update in a single operation. In this post I will share what I have found as a way of making this work with PostgreSQL. 

PostgreSQL

As our starting point, we can set up a table that has three attributes, that can be created as follows: 

CREATE TABLE event (id UUID PRIMARY KEY, 
name varchar(255) NOT NULL,
version bigint NOT NULL DEFAULT 0)

It is not a very realistic representation, as name isn't something that should typically be expected to change, so don't expect to focus on any real world business purpose from this.

The following expression represents a JDBC statement that will attempt to insert data, but in the event of conflict update the existing record - only if the existing record's version is lower than the version being supplied. (JDBC is a Java API for interacting with databases from Java).

Each '?' character represents a placeholder for a value to be passed in, so in this situation we would be passing in the following:

id, name, version, name, version

For our simple table that specifies the id column as primary key, the database will reject attempts to insert when a record already exists having the same id.
INSERT INTO event (id, name, version) VALUES (?, ?, ?) ON CONFLICT (id) DO UPDATE SET name = ?, version = ? " +
WHERE event.version < ?

Having the logic applied in a single statement that is sent to the database for processing should be more performant than splitting the logic up in an application where exception handling could attempt to achieve the same functionality, like the following pseudocode:

try {

  // No version awareness required, as will fail on duplicate id 

  runInsert(id, name, version);

} catch (SQLException exception)

{

  if (isDuplicateKey(exception)) { 

  // Within runUpdate, would have version check 

    runUpdate(id, name, version); 

  } else {

   // Do some other exception handling here 

  } 

}

This application code would involve some network latency between the insert and update calls. This would probably only show up as problematic in situations when multiple updates are arriving in at around the same time, leading to a race condition that is sensitive to the accurate current representation of the version value - which leads us to considering transactions and isolation levels.

Transaction isolation levels

Whenever there is a possibility of a race condition we need to have a firm understanding of how the database will behave when multiple tranactions could be running at the same time and referring to the same row(s).

In this situation we would particularly need to consider whether the operations that are happening in parallel could have an out of date view of the current state as a context switch is occuring.

I tried to set up some test code that could be expected to reveal when a race condition picked up an uncommitted value by having a transaction left open for an artificially longer period of time before having it rolled back. After failing to see the expected issue occur I went back to read some documentation and found that PostgreSQL doesn't actually allow the possibility of "read uncommitted" transaction isolation level. 

"In PostgreSQL READ UNCOMMITTED is treated as READ COMMITTED."  

Transaction levels documentation 

So, we cannot trigger a situation where a transaction could read a value that a non-committed transaction has set.

In part two I will share how MariaDB offers a different approach to achieving the equivalent version aware upsert functionality.

A time for cool heads