Saturday, 28 March 2026

The confident implementation, that was fundamentally flawed

Applying AI to improve scalability

I was evaluating the implementation of a process for parsing multiple files and uploading the resulting generated file as an object in an S3 bucket.

The main consideration was scalability, as the volume of incoming data was soon going to increase significantly and we did not want to have to scale up our service instances just to cope with this particular use case.

The existing process involved creating a temporary file and writing to it before uploading to S3. This was going to be problematic as the virtual environment instance size involved was not going to have enough disk space to handle the larger data sets that we knew would be coming in the next milestone.

I looked into bypassing the temporary file generation and uploading straight to S3. This would involve adjusting the setup to use some multi-part upload API calls instead of the single step of writing to S3.

I explained the situation to the AI command line interface and watched as it summarized its approach and re-factored the implementation and tests to involve buffering and multiple calls to upload the content without involving any temporary file.

On the surface everything looked good, but as I read through some comments that the AI had included in the code something started to show up as problematic...

It didn't do what I asked for

The comments mentioned that it was not using multipart upload, even though I had specified that in the instructions.

Instead of progressively uploading the content to the destination object in S3, this AI generated implementation would write each buffer of data as the content of the destination object. This would mean that only the last chunk of data would ever be included at the end of processing.

The importance of attention to detail

Based on my experiences, using AI for code generation is rarely a "one and done" situation.

I had to push further to have some tests generated that would surface up the flaw of the implementation. In this instance it involved processing data that exceeded a single buffer size, and validating that the content presented to the destination would be complete.

Still a need for the human in the loop

For this particular example the code generated included comments that spelled out where the implementation was not matching my expectations, so it was not difficult to detect the problem.

Without using very much imagination, I can envisage how teams could easily be seduced by the perception that AI can efficiently generate software with less need for their reviewing input. Based on speculative articles already circulating online, I think we may have already seen that from companies such as AWS.

Do we have to spell it out?

In this situation I already knew what I wanted, so I could have placed more emphasis on that aspect when instructing the AI to generate the code. That would not help in a situation where the API was less familiar or the scope was broader.

If AI is only capable of following strict instructions from someone who already knows enough of the details to recognize when it is done, then is it just the equivalent of outsourcing?

Will there be a cycle of "We can outsource now" and "We need to bring that back in house now"?

No comments:

Post a Comment