Skip to main content

Designing APIs for use by AI, and others

Introduction 

I recently listened to a podcast episode that offered an introduction to some of the core concepts of retrieval-augmented generation (RAG) for artificial intelligence systems. One of the many points covered was the opportunity to prompt the system to inform the user if it could not determine a crorect answer. In this post I will share some real world experience of how attention to detail in API design can help or hinder this capability.

Example of an unintended limitation 

"A chain is only as strong as its weakest link".

Databases and search engines have different use cases and so have different associated expectations around what type of behavior is expected when the system is not fully operational.

Most databases are based on a concept of indexes and records, where it is only appropriate to present back a query result if the authoritative data store has successfully been contacted and produced a result.

Search engines can be a little bit looser, on the basis that producing some results will be more useful than presenting back no results at all.

Partial results, unknown

When using a search system such as ElasticSearch, the API can provide context around whether the search was able to run over a completely representative range of nodes on the cluster. In a situation where one or more nodes was unable to produce a timely response, the search response can include an indication that the result is only partially complete.

The problem that we faced was when an API layer sat between my service and the ElasticSearch implementation, abstracting away the possibility of partial results. This effectively hid the possibility of data being incomplete.

To compound the issue, the structure of some of the data involved included a nested list, that ElasticSearch can index separately to the core document. That meant that in addition to not finding a match for a given ID, we could also face a situation where a document was found but could be missing a subset of its data.

An unusual CAP theorem trade-off

As a consumer of an API, I want to have some awareness of how trustworthy the data will be.

If I get back a representation of the state of an entity, I'd like to know if it is potentially incomplete so that I have the option of retrying, or presenting the consumer with that context so that they can make informed decisions around how to utilise the information.

In the ElasticSearch situation. the possibilty of a nested list having some items missing could result in seeing the entity in a state that it actually never been in - i.e. it's not a case of eventual consistency where we're seeing a slightly out of date representation of the data.

Sidenote - When can it happen?

In my limited experience, the partial results situation was only seen when the ElasticSearch cluster was under unusually high load, such as when an additional nested structure was introduced without appropriate corresponding indexing configuration.

Summary

In a world of microservices and everything as a service we have a responsibility to detect when edge cases are encountered, and to minimise the possibility of unintentionally disrupting systems that rely on the data that we are making available.

As I see it, in the scenario described in this post there were two main alternative options to choose from:

1. Propagate the possibilty of partial results in the API response, along with suitable caveat information in the documentation

2. Treat partial results as the system being temporarily unavailable, avoiding the possibility for consumers of the data to miss the more nuanced implementation detail related to the nested structure.

   

Comments

Popular posts from this blog

Having a go at learning some Kotlin

What's this about?  The year 2025 is almost over, so that means that it has been a bit over a decade since my old colleague Filippo gave a presentation to the development team of ScienceDirect covering the merits of the Kotlin programming language. So, it's about time that I had a proper go at using it. This blog post is intended to trace what the experience has been like, covering surprises that I encounter along the way. Getting started The programming language that I am most experienced with is Java, so I have chosen to try out implementing some functionality in Kotlin from a recent hobby project that I developed in Java involving spinning up a database in a Docker container and running some queries. JVM version support IntelliJ IDEA includes some automation for creating a new project, so I selected the relevant options to use the latest LTS version of the Java virtual machine with Spring Boot, Kotlin, Postgresql and Test containers. After a few seconds I had a new project i...

The Importance of Segmenting Infrastructure

Kafka for Logging I was recently poking around in the source code of a few technologies that I have been using for a few years when I came across KafkaLog4jAppender. It enables you to use Kafka as a place to capture application logs. The thing that caught my eye was the latest commit associated with that particular class, "KafkaLog4jAppender deadlocks when idempotence is enabled" . In the context of Kafka, idempotence is intended to enable the system to avoid producing duplicate records when a producer may need to retry sending events due to some - hopefully - intermittent connectivity problem between the producer and the receiving broker. The unfortunate situation that arises here is that the Kafka client code itself uses Log4j, so it can result in the application being blocked from sending its logs via a Kafka topic because the Kafka client Producer gets deadlocked waiting on transaction state. Kafka For Metrics - But Not For Kafka Metrics This reminded me of a similar scen...

2022 - A year in review

Just a look back over the last 12 months. January I moved back to Christchurch to live, after having spent a few months further south since moving back from London. Work was mainly around balancing other peoples' understanding and expectations around our use of Kafka. February I decided that it would be worthwhile to have a year's subscription for streaming Sky Sports, as some rugby matches that I would want to watch would be on at time when venues wouldn't be open. Having moved to Christchurch to be close to an office, now found myself working from home as Covid restrictions came back into effect across New Zealand. March Got back into some actual coding at work - as opposed to mainly reviewing pull requests for configuration changes for Kafka topics.  This became urgent, as the command line interface tool that our provisioning system was dependent on had been marked for deprecation. April   Had my first direct experience with Covid-19.  I only went for a test because ...