No Speed Without Control: A disciplined approach to LLM-augmented software development

November 17, 2024

Since late 2022 I've been experimenting with using Large Language Models (LLMs) to augment my programming. I believe there is skill and discipline required to get the best out of them, and as such I've seen many software developers who are dismissive or uninterested. I think this is mistaken - once I learnt to play to their strengths, I've found LLMs let me move faster without sacrificing quality.

Initial disappointments

GitHub Copilot was the first LLM I used. I remember the first time I wrote a function signature and watched the exact code I intend to write materialise. It was mind blowing, but as someone who is paid write software, it also caused a degree of trepidation.

That fear was short lived because I quickly became disillusioned with GitHub Copilot. The initial thrill of using the product comes from how much friction it removes and how quickly you can generate code. But I soon found myself debugging issues caused by divergence between what I had assumed Copilot had written and reality. Then there is the ever present threat of hallucinations - sometimes it just completely makes things up.

But most problematically, Copilot's effortless generation makes it extremely tempting to "author" code you don't fully understand. I found this made me less rigorous about my coding and reduced my sense of accomplishment. I eventually stopped using Copilot, fearing that I was evolving systems faster than I was understanding them, and anticipating that my short-term gains would lead to long-term pains.

Despite this, I could see that AI had the potential to be a force multiplier for software development. I just needed to inject the right friction back into the process to retain rigour, control and ownership. I have found the chat interface pioneered by ChatGPT encourages a much healthier path to AI-augmented programming.

Here are the techniques I find make me a faster and more knowledgeable developer, without compromising control and mastery of the craft. But before, we need to talk ethics.

Responsible use

Be mindful of whether you have permission to share the code you're working on with the LLM you're using. Familiarise yourself with your preferred LLM's policies around handling of conversation history.

Your employer may have justifiably strict rules about sharing proprietary code - understand them.

Know whether the project you're contributing to or the company you work for is comfortable accepting AI generated code. Copyright law is playing catch-up with who owns AI generated content.

Some advice in this post should be acceptable regardless of circumstance, but the onus is on you to act responsibly.

Now, onto the techniques ...

Do not copy & paste LLM output - type it out yourself

This is a simple rule that makes a big difference. I find reading and then typing out LLM responses keeps me "in-the-loop" and engaging with the suggestions, rather than blindly accepting them. This extra friction gives me time to acknowledge and research parts of the code I don't understand, and eliminates the risk I'm pasting code which is different from what I assume.

Often, the generated code is only 95% of what you want. If you're just copying and pasting, the temptation is just to live with that and "move fast", but if you're typing it out anyway, you may as well write the code you actually want.

Copy & paste as much as you can into your prompts

The flip side of not copying output is that I find the more context I give LLMs, the better the results. This means you want to make it easy to import relevant code into your conversations, as removing friction here will make you more likely to add useful context.

That said, re-read the "Responsible Use" section and ask yourself if you have permission to do this.

GitHub Copilot for VS Code makes importing context into a conversation easy, but this comes at the cost of being explicit, and I don't like VS Code or the vendor-locked ecosystem Microsoft are trying to establish.

A portable approach is to add files to your prompts in this format:

path/to/file.js 
```
...file contents...
```

Giving the LLM the whole of the file and its path allows it to understand the relationship between files you post.

This Vim mapping adds the current file to your paste buffer in this format by pressing <Space>ly ([L]lm [Y]ank):

nnoremap <Space>ly :let @+ = expand('%') . "\n```\n" . join(getline(1,'$'), "\n") . "\n```"<CR>

I also have a simple script which prints a given file in this format. On macOS I can then pipe this to my clipboard using pbcopy (you'll need something like xclip on Linux).

Use the best model you can

LLMs are not created equal and, although it's hard to quantify, in my experience there is a tangible difference in the quality of responses between models. I recommend experimenting with different models, and paying for access to better ones. At the time of writing I prefer Anthropic's Claude 3.5 Sonnet but switch to ChatGPT's o1-preview for more complex tasks. However, this is a fast moving space and these recommendation are likely to become outdated quickly.

Before choosing a model ensure you understand their usage policy and handling of conversation data.

A problem well-stated is a problem half-solved

I highly recommend reading OpenAIs Prompt Engineering Guidelines, but the TL;DR is: be explicit about what you want.

You can get pretty decent results by blindly throwing half-formed questions at an LLMs, but I find the magic happens when you ask detailed questions which include a clear definition of what you're trying to solve and what type of solution you want.

Write your prompts in a text editor

Unfortunately, the prompt UI text boxes in LLMs are usually too small to write the longer more detailed prompts which generate the best results. Additionally, there's nothing more galling than writing a long prompt then accidentally refreshing your browser and deleting it.

As such, for long queries I find it better to draft them in a text editor then paste them in.

An interactive rubber-duck

Rubber Duck Debugging is the process of describing the programming issue you're facing to an inanimate object (such as rubber duck) in order to force yourself to state all your assumptions in natural language. This process is remarkably effective at surfacing the underlying issue.

Instead of talking to an inanimate duck, write a prompt to your preferred LLM. You'll get all the benefits of rubber ducking, but also get a free shot at getting the answer from the machine.

Ask stupid questions

LLMs offer a safe space to ask the most esoteric or asinine questions you'd be too embarrassed to ask an actual human and get an answer which is sometimes surprisingly thoughtful. Again, the more detail you add, the better the answer - don't worry, you're not wasting anyone's time but your own.

Code reviews are free & instantaneous now

Before proceeding, I will once again refer you back to the "Responsible Use" section.

Assuming you have permission to share your code, LLMs make it trivial to get a quick review on a change set by sending the prompt a diff. Simply pipe a git diff to your paste buffer:

# Assuming `main` is your default branch and you're on a Mac
git diff main.. | pbcopy

Then use a prompt in the form of:

Please review my changes for correctness and clarity:

<optional change description>

<the git diff>

This is such a cheap way to get feedback I find it a total no-brainer. Working as a solo developer this means actually getting reviews, but if you work on a team this is a nice way to catch silly mistakes and save other reviewers time.

Seek different perspectives

The above is a simple template for generic feedback, but I find reviews more engaging when I ask the LLM to add a bit of character. Here are some fun prompt addenda to try:

What would <programmer you respect> say about these changes?
- I like to use Rich Hickey, Casey Muratori or John Carmack.
How could I align these changes with
?
- Try Functional Core, Imperative Shell by Gary Bernhardt or Data Oriented Programming by Mike Acton.
Please consider <a style guide>.
- I like Tiger Beetle's TIGER_STYLE

Finding balance

With these techniques, I feel I have found balance. It's still me authoring the code; I still feel ownership of the results. But I now have someone with limitless patience to "riff" on ideas with, catch mistakes, and take some of the drudgery out. I hope these ideas help you find your own balance, and let you ship better software.

Don't Mock What You Don't Own - a moderately-contrived story

July 27, 2022

A piece of automated testing guidance I find people often struggle with is Don't Mock What You Don't Own. The basic idea is that you should never use mocking to replace a third-party interface in your codebase.

Instead, you should define your own wrapper APIs around third-party interfaces, the design of which is driven by your application's requirements, rather than being dictated by the shape of the third-party interface.

These wrappers can then be safely mocked because you control the interface. This leads to a "hexagonal" architecture with low coupling.

The problem is this sounds like vague academic nonsense, which requires extra code to be written with the only immediate benefit being that your system's internals are easier to test. It's hard to blame most engineers (who consider themselves finely-tuned bullshit detectors) for concluding that the Test Driven Development (TDD) acolytes have lost their minds in pursuit of a clean test suite.

In this post I'm going to give a concrete example of how this approach affects a system's design and demonstrate how it helps a system adapt to change. Hopefully by the end you'll see why this pattern is worth considering even if you're not practising TDD.

A worked example with third-party mocking

Imagine we're building a web application which displays data fetched from an imaginary third-party git hosting provider named "repo-host.com".

Our application has two end-points, repo_view and other_view, both of which query the repo-host.com API and render a template using the retrieved data.

def repo_view(repo_id):
    response = http.get(f"repo-host.com/api/{repo_id}")
    repo = response.as_json()
    return render_template("""
        <h1>{repo['name']}</h1>...
        <span>{repo['commits'].length}</span>...
    """, {repo: repo})

def other_view(repo_id):
    response = http.get(f"repo-host.com/api/{repo_id}")
    repo = response.as_json()
    return render_template("""
        ...<span>{repo['commits'].length}</span>...
    """, {repo: repo})

A URL change

One day we receive a message from repo-host.com developer outreach telling us that unforeseen architectural challenges force a URL change. repo-host.com/api/:repo_id will soon be repo-host.com/api/v1/:repo_id. You grumble something about non-RESTful URLs but agree to update your application.

In doing so, you notice that you've got some repetition in your code, which means you must update two lines which reference the URL in question. "Don't Repeat Yourself!" you cry, as you refactor the common code into a get_repo function.

# Refactor common fetch code into a function
def get_repo(repo_id):
    response = http.get(f"repo-host.com/api/v1/{repo_id}")
    return response.as_json()

# Views are now "DRY"
def repo_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        <h1>{repo['name']}</h1>...
        <span>{repo['commits'].length}</span>..
    """, {repo: repo})

def other_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        ...<span>{repo['commits'].length}</span>...
    """, {repo: repo})

The response body changes

repo-host.com have realised that returning a list of all commits with every request is expensive and are applying rate limiting to /api/v1. To continue querying at the rate we need, they encourage us to use the rate-unlimited simple=true parameter, which instead returns a commit_count attribute.

As you come to make this change, you realise this is a bit annoying, as you have two templates expecting a commits array attribute, which both need to be updated.

def get_repo(repo_id):
    # UPDATED
    response = http.get(f"repo-host.com/api/v1/{repo_id}?simple=true")
    return response.as_json()

def repo_view(repo_id):
    repo = get_repo(repo_id)
    # UPDATED
    return render_template("""
        <h1>{repo['name']}</h1>...
        ...<span>{repo['commit_count']}</span>...
    """, {repo: repo})

def other_view(repo_id):
    repo = get_repo(repo_id)
    # UPDATED
    return render_template("""
        ...<span>{repo['commit_count']}</span>...
    """, {repo: repo})

Still, not the end of the world. You update your code and test mocks, run your CI and deploy.

Runtime exceptions in production

As soon as your release hits production, you start seeing runtime exceptions. Turns out, someone else on the team had seen your helpful get_repo function and integrated it into their own view.

# Someone else's view
def repo_commits_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        {% for commit in repo['commits'] %}
        {% commit['sha'] %} {% commit['message'] %}...
        {% end %}  
    """, {repo: repo})

This code depended on the commits array attribute in the return value which we just replaced with commit_count. Why didn't the CI catch this?! Were there no tests?!

There were tests, but unfortunately they mocked the request object, which isn't owned by our system:

def test_repo_commits_view():
    with mock.patch("http.get") as fake_get:
        fake_get.return_value = mock.Mock(as_json={
            "commits": [...]
        })
        test_client.get("/repo/commits")
        ...

You roll the deploy back, and set to work adding a "simple" boolean parameter to your get_repo function. But before you can finish, the phone starts ringing again.

"Yeah, so HTTP isn't working out for us"

repo-host.com have decided that gRPC is the wave of the future, and are deprecating their HTTP REST API. Your heart sinks as you realise exactly how many parts of the system are calling get_repo and therefore expect a JSON key-value object, which you'll now need to re-write to match the gRPC values.

What went wrong?

This is, admittedly, a highly contrived example. I hope most third parties would provide a more stable API than repo-host.com.

Additionally, many of the problems here wouldn't have made it to production with extensive integrated testing. However, integrated tests have their own set of problems, and one of the goals of TDD is to arrive at designs which can be verified with as few integrated tests as possible.

The common root of our issues is that we've allowed our design to be affected by what's available rather than what we need. We've invited a data structure we don't own (the repo-host.com REST API response body) deep into our application, to the point where even our template layer's code is informed by it. The whole of our system is now "coupled" to this structure, and as soon as it changes, we have to change the entire system with it.

Take 2

This is why "Don't mock what you don't own" and Discovery Testing put a focus on describing the dependent layers of your system in terms of the interfaces you want to exist, rather than being guided by what's available.

Combined with YAGNI, you end up with smaller interfaces to third-party services which are tightly coupled to your domain model, and loosely coupled to the third-party.

Let's rewind to the start of our system, and imagine the sort of design I'd expect to arrive at following those design principles.

We start with our views, imagining the get_repo function we want to exist.

def repo_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
      <h1>{repo.name}</h1>...
      <span>{repo.commit_count}</span>..
    """, {repo: repo})

def other_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        ...<span>{repo.commit_count}</span>...
    """, {repo: repo})

Because we're focussed on what we want, not what's available, the return value from our imaginary get_repo function is a simple object rather than a JSON dictionary, and we only reference commit_count, rather than doing commits.length on an array we otherwise don't use.

Now, we implement our imagined get_repo function, mapping the third-party interface into the first-party interface we just designed.

def get_repo(repo_id):
    response = http.get(f"repo-host.com/api/{repo_id}")
    return Repo.build_from_response(
        response.as_json()
    )

# This class models only attributes we need
class Repo:
    name: str
    commit_count: int

    @classmethod
    def build_from_response(cls, response_body):
        return cls(
            name=response_body['name'],
            commit_count=response_body['commits'].length,
        )

The first thing you'll notice is: this is a lot more lines of code! This is a valid concern - more lines means it takes longer to write and creates more space for bugs to hide in.

So what's the upside?

All translation between repo-host.com and our internal system are now encapsulated by the get_repo function. We "own" the entirety of the get_repo interface, including the return type, meaning, according to "Don't mock what you don't own", this function is now fair game for mocking.

As such, our view tests can look like this:

def test_repo_view():
    with mock.patch("get_repo") as fake_get_repo:
        fake_get_repo.return_value = Repo(name="hello", commit_count=0)
        ...

If we wanted to mock out our previous get_repo implementation we had to specify an arbitrary JSON object as our return value. With our new implementation, we can specify an instance of our new Repo type.

This smaller, simpler return type makes for an easier to read test. Additionally, because Repo is a concrete class, we can be confident that our return value has the same fields as those in the production system.

As for testing the get_repo function itself, as this calls the third-party repo-host.com interface over HTTP, we cannot safely mock its internals and must rely on integration testing. In these situations I would typically reach for a tool like VCR.py. Thankfully, because the responsibilities of get_repo are very specific and limited, we should not need many integrated tests to have sufficient confidence.

Replaying the changes

So, what happens as we work through those same sets of required changes to the system?

First up, changing the URL to add /v1

def get_repo(repo_id):
    # One line change - update the URL
    response = http.get(f"repo-host.com/api/v1/{repo_id}")

Building get_repo_commits_view

Next, a step we didn't see happen with the first design - someone else building a repo_commits_view. Previously, they saw our get_repo function, observed that the commits attribute was an array and built their view around it.

With our new implementation, there's no opportunity for that accidental coupling, as our 1st-party Repo class only models the attributes we use: name and commit_count.

While the temptation still exists be guided by what's available and extend get_repo to add the commits to the Repo class, let's again imagine the interface we want to exist.

def repo_commits_view(repo_id):
    # we don't get to `get_repo`, we want to...
    commits = get_repo_commits(repo_id)
    return render_template("""
        {% for commit in commits %}
            {% commit.sha %} {% commit.message %}...
        {% end %}
    """, {repo: repo})

Now, we must code our imagined get_repo_commits function into existence:

def get_repo_commits(repo_id):
    # Duplicated code
    response_body = http.get(f"repo-host.com/api/{repo_id}").as_json()
    return [
        Commit.build_from_response(res) for res in response_body['commits']
    ]

class Commit:
    message: str
    hash: str

    @classmethod
    def build_from_response(cls, response_body):
        return cls(
            name=response_body['message'],
            hash=response_body['sha']
        )

Again, we end up with a function (get_repo_commits) where we own the entire interface, and therefore can mock it safely when testing other functions. The function itself must be integration tested as it communicates with a third-party.

Don't Repeat Yourself again

In building this, we copy-and-pasted the repo-host.com HTTP GET call from get_repo. Once our integrated tests our passing, we decide to refactor the common http.get code into a shared repo_host_fetch_repo function:

def get_repo(repo_id):
    response_body = repo_host_fetch_repo(repo_id)
    ...
    
def get_repo_commits(repo_id):
    response_body = repo_host_fetch_repo(repo_id)
    ...

def repo_host_fetch_repo(repo_id):
    return http.get(f"repo-host.com/api/{repo_id}").as_json()

We're pretty happy that we've eliminated the duplication. We might be tempted to re-write our integrated get_repo and get_repo_commits tests as isolated tests which mock out repo_host_fetch_repo, but since the return type is raw JSON from a third-party, it fails the "Don't mock what you don't own test". As such, the integrated tests stay.

Rate limiting

Now let's introduce the rate-limiting change which makes fetching commits expensive. This time our refactor is safer, as it's clear from the code which functions depend on commits. As an example, let's do the most naive thing possible and just add simple=true to repo_host_fetch_repo.

def repo_host_fetch_repo(repo_id):
    return http.get(f"repo-host.com/api/v1/{repo_id}?simple=true").as_json()

Immediately, our integrated tests for get_repo_commits start failing, as simple=true means repo_host_fetch_repo no-longer returns the commits array. This reveals that our optimistic, duplication-reducing refactor was folly. We decide to unroll our repo_host_fetch_repo function.

Here are the complete required changes:

def get_repo(repo_id):
    # unroll `repo_host_fetch_repo`,
    # specify `simple=true`, we don't need commits
    response_body = http.get(f"repo-host.com/api/v1/{repo_id}?simple=true").as_json()
    ...

class Repo:
    ...
    def build_from_response(cls, response_body):
        return cls( ... 
            # Use `commit_count` rather than `['commits'].length`
            commit_count=response_body['commit_count'],
        )

def get_repo_commits(repo_id):
    # unroll `repo_host_fetch_repo`,
    # don't specify `simple=true`, we need commits
    response_body = http.get(f"repo-host.com/api/v1/{repo_id}").as_json()

That's it - three lines modified and no need to update any tests. The rest of the system is isolated from the change and requires no updates. Despite only writing integrated tests for get_repo and get_repo_commits, the test were sufficient to catch integration mistakes and prevented us from shipping broken code to production. Had we not done our premature duplication-reducing refactor, we wouldn't have needed to touch get_repo_commits either.

HTTP -> gRPC

Switching from integrating with HTTP to gRPC sounds like a pretty big change, but when you're practising "Don't mock what you don't own", and designing interfaces based on your domain model's needs, it's actually not that crazy.

What would we have to change? Well, our integrated tests for our boundary functions get_repo and get_repo_commits would have to be completely re-written, and we'd have to make them pass. But that might be it. The rest of your system is written in terms of what we want, and once we've translated at third-party boundary into these first-party representations, there may well be no reason for the rest of the system to change.

Was it worth it?

The gains here may seem small, but they scale as your application grows.

I'm not here to pretend that creating bespoke wrappers for your third-party interfaces isn't more work up-front. In the early stages of your application you are certain to write more code, much of which may seem like unnecessary boiler-plate. Do not follow this pattern if you're building something in a 48 hour hack weekend.

However, like many TDD practices, this pattern helps you write code with low coupling, which at its core means modifications to your application require fewer parts of your system to change.

This means your delivery cadence is more likely to be stable, without dramatic spikes for unexpected new requirements. Upgrading to newer versions of libraries is easier, so you can use the latest versions of tools and apply critical security updates quickly. Because your system is easier to adapt, you can say "Yes" to big changes ("gRPC? No problem") or pounce upon opportunities (What if we cached the whole of repo-host.com in a database?) that a more tightly coupled system might preclude.

And lastly, it lays out a framework for deciding where to use integrated tests (which are expensive to write and maintain) while providing safe spaces to use isolated unit tests with mocks without worrying about compromising your test suite's ability to catch errors.

The shabby, productive glory of personal shell scripting

January 19, 2022

As professional engineers, we're taught to solve our problems robustly and generically.

We're discouraged from making assumptions about the environments our code will run in, as this can build coupling into the system which becomes intractable if those assumptions don't hold. We think hard about edge-cases where our string-parsing RegExp might not work. We don't couple our behaviour directly to our communication interfaces, as we may want to offer a different interface to that same behaviour later. We don't assume that we'll have access to certain libraries, so we use dependency management systems to declare and install our requirements.

The phrase "works on my machine" is viewed as the mark of an amateur.

This mindset is sensible in uncertain environments, where requirements can change, features are speculative, we can't anticipate the contexts in which our code will run, and where we have customers to whom we owe a good experience. And we've all been in codebases which are hard to maintain, extend and debug because they lack the qualities which come from these principles.

But it's not always much fun being a professional. Solving our problems generically takes longer. It requires us to think carefully about edge cases we'd rather not deal with. And sometimes, as an engineer, you just want to build things.

Personal Shell Scripting

Like many nerds, I maintain my dotfiles on GitHub, to allow me to version and sync my system configuration across my computers.

One of my long running frustrations has been that different machines often have different libraries and tools installed, so I ended up having lines of config which I comment out on machines where they're not relevant. This was annoying whenever I set up a new machine, and I'd always have to avoid committing uncommented lines whenever making changes to my config.

The straw that finally broke the camel's back was when I got an Apple Silicon Mac and discovered that homebrew paths were different. Now I also needed to conditionally modify my PATH specification depending on my architecture. Finally it became more lazy to not solve the problem. So I thought to myself, what's the easiest fix I can think of?

I use the delightful fish shell and had been looking for an excuse to learn a bit more about scripting with it. Some quick Googling and I had a simple function to string match on the result of a subshell to uname

function on_apple_silicon
  set -l system_arch_name (uname -m)
  if string match -q "x86_64" $system_arch_name 
    return 1
  else
    return 0
  end
end

Immediately, my "professional" programmer brain kicked in. There's probably a way to collapse lines 3-7 into a one-liner. Perhaps there are edge cases where the string match fails. What if I'm on a 32-bit system?

And then I realised: who cares? This code is for me and me only, and it works on every machine I own, so time to move on to the next problem. Now I can call this function to conditionally set my homebrew path.

Drunk on my new found productivity, born from ignoring the need to do things "properly", I set about solving my issue where I didn't want to install my direnv hooks on systems where direnv isn't installed. This time I didn't even need to do any Googling:

function is_installed -a executable_name
  set -l installed (which $executable_name)
  string length -q -- $installed
end

if is_installed direnv
  direnv hook fish | source
end

Is this the best way to see if a program is installed? I haven't even bothered to think about it since writing the function and finding it worked. And this is how the is_installed function shall stay until it doesn't work on one on my machines. Worst case scenario is that I annoy future me, and that guy is the worst.

Since making this selfish realisation, I've had much more fun with fish scripts. Liberated from the pressures of perfection, I've written loads of little ones and gradually replaced my notes library of shell snippets with custom functions. You can get a lot done when you truly embrace YAGNI.

Action matters

Professionalism was actually stopping me from solving my problems - I'd been working around the issue of different config on different machines for years but I'd not solved it as I mentally shrunk from the effort required to make it work "correctly". In the end, it turned out a 6 line function was good enough.

As someone who tries to have a bias towards action, this was a bit of a scary realisation. Building things and using them in anger is the ultimate feedback mechanism and it's a reminder that there is a balance to be struck between coding rigorously and learning by shipping. I won't go as far as to declare that you should Write Shitty Code, but as someone who takes pride in writing high-quality code, having these personal scripts as a space to explore a shabbier style of programming was a reminder of some of the benefits of the other side of that balance.

Now I'll try to catch myself if I'm procrastinating on solving a problem because I can't think of an elegant approach. Are there any little annoyances you haven't fixed because you couldn't get over the hump of solving it properly?

If you are so inclined, you can check my little collection of scripts here. They come with no warranty and they have no documentation, but guess what?

"Works on my machine"

The Lindy blog engine

December 15, 2021

Ever since reading Nicholas Nassim Taleb's Antifragile I've been fascinated by The Lindy Effect, and how it relates to software. From Wikipedia:

The Lindy effect is a theorized phenomenon by which the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age. Thus, the Lindy effect proposes the longer a period something has survived to exist or be used in the present, it is also likely to have a longer remaining life expectancy.

Put another way, if something has been used for 40 years, the Lindy effect suggests it'll still be around in another 40. There are a number of ways to apply this framework to software engineering - for example, improving your knowledge of SQL is likely to be better investment than learning the latest JavaScript framework, as Lindy suggests the knowledge has a longer shelf life. Of course, Lindy shouldn't be the sole factor for making a decision like this, but is often a powerful indicator which is simple to apply, especially in uncertain domains.

The Analog Moment blog engine

This blog has always been powered by a bespoke blog engine. Through its life, it has been through numerous technological shifts, and it has typically been a programming playground where I indulged technologies and patterns I wanted to learn but couldn't justify in a professional context. Over the years, the blog has been ported from CoffeesScript to ES6 to TypeScript, and from Capistrano to Docker to Heroku.

The running joke is that I've spent far more time rewriting the engine than I have blogging. I've been happy to justify as I enjoy the learning experience I get from this experimentation, but I finally decided I want to do less blog engine development, and more actual blogging. However, one nagging opportunity for procrastination remained.

Code rot

Maintaining this blog incurred development costs other than those imposed by my self-inflicted re-writes. Namely, the security rot of deployed code and the corresponding breakage caused by upgrades. Node.js versions go out of date, operating systems and Docker images need updating, and that's before we say anything about NPM package breakage - keeping a deployed service secure and up-to-date was surprisingly demanding, even with an extensive test suite.

So I thought to myself - can the Lindy effect give me a framework for building a blog engine that doesn't rot?

The Lindy re-write

The basic idea was to resist all urges to embrace hot new technologies, and instead prefer choosing older technologies which have remained relevant, with the theory being this would reduce the probability of the underlying technology either changing dramatically or becoming unmaintained.

Static rather than dynamic

An early realisation was that a big way to avoid exposure to technologies that might require maintenance was to reduce the amount of runtime code. Analog Moment had always used an express.js server to pull blogs from a redis data store at runtime, performing rendering on the fly. However, the amount of content on the site and the frequency of new posts means an upfront rendering of all the pages on the site in a single build step is a viable option. This rendering could produce a static directory of HTML files, which then just needs hosting somewhere.

express.js was originally released in 2010, giving it a Lindy lifespan of 11 additional years. Serving static HTML files goes back to 1993, giving it a Lindy lifespan that would almost see me through till retirement.

Static HTML also has the benefit of simplifying my deployment requirements - numerous hosts offer static HTML serving, so I need not worry about being locked into a vendor and having to maintain extensive vendor-specific deployment code.

Site generation frameworks

There are lots of off-the-shelf static site generators, but which does best on the Lindy test? Gatsby (2015), Hugo (2013) and Next.js (2016) are popular but newer than express.js. Jekyll (2008) fares a little better, but still only promises 13 years of Lindy life, which is less than I'd like for something that I'd be coupling my blog posts to.

Fine, I thought, I'll build one myself - how hard can static HTML generation really be? My requirements are simple and I'm only building a tool for myself. As long as I stick to the built-in's of the language I should be able to avoid coupling myself to technologies which are likely to require too much maintenance.

Language choice

I've been looking for an excuse to learn Go, but having first appeared in 2009 it's too young. Next I considered Ruby, a language in which I'm familiar and have great fondness for. Ruby was first released in 1995, which is not bad, but can we do better?

How about C? OK, it was originally released in 1972 and remains used today, so it scores well on the Lindy test, but there's no way it's at the appropriate level of abstraction for the task at hand.

I eventually settled on Python. It's boring, but it remains popular and it dates back to 1991, promising 30 more years of Lindy goodness!

Post format

The posts for Analog Moment are all written in Markdown, which dates back to 2004. Not quite as good as the tools we've chosen from the mid 90s, but probably an acceptable choice as it's what I'm already using and it remains ubiquitous.

But what of the additional metadata that needs storing about posts (slugs, timestamps, title)? How should I package those up with the posts? The pre-existing implementation used JSON, which RFC'ed in 2006. YAML is slightly older, first released in 2001, but it requires installing a third party package to use in Python, which didn't seem worth it to attach three fields to some Markdown.

In the end, I decided to define my own template format to avoid coupling myself to anything. The slug is read from the filename, then each file looks like this

title: Nice post
timestamp: 2021-09-07T19:53:44+00:00
\body:

post content goes here

Finally though, I've got to actually convert that markdown to HTML, and this ended up being the weakest part of the stack from a Lindy perspective. I initially used python-markdown (2008, possibly earlier) but switched to commonmark.py (2015) as a renderer. I switched because commonmark.py is designed to conform to the popular commonmark variant of Markdown, which seems like the pragmatic bet for long-term maintainability. It also had type annotations, which python-markdown did not.

To avoid deeply coupling my code to any particular library, I implemented a small wrapper class around the library API to make switching markdown renderer a drop-in replacement (the validity of this approach was proven when the switch was trivial).

Supporting libraries

Things fell apart a little with correctness enforcing tools. I chose pytest (2009) as a test runner. I probably could have gotten away without this, but I couldn't resist the convenience.

And I broke the rules pretty badly for linters, using flake8 (2010), mypy (2012) and black (2018!). However, this felt justifiable given that none of these tools actually provide any of the functionality of the blog engine, they only support the quality and correctness of the code. If any of them were to become unmaintained or stop working, the blog engine would remain functional (although I'd likely seek a replacement)

CI and deployment

I broke the rules again here. Because the core output of my tool is a directory of HTML and therefore extremely portable, I chose to treat the coupling to my deployment as disposable. In the end I chose Google's Firebase hosting as it's astonishingly simple to deploy to and provides a CDN. However, I anticipate few challenges if I'm forced to migrate to a different host, as support for hosting static HTML is widespread and unlikely to become unsupported any time soon.

I'm using GitHub Actions as my CI runner, which again I'm treating as disposable - all the CI does is run the test and linting scripts, then trigger the deployment. The whole pipeline is less than 100 lines of code.

Non-Lindy strategies against code rot

In attempting to build a blog that would be easy to support in the long term, I also made some decisions that weren't informed by Lindy

Tests? Tests.

I've long been a Test Driven Development (TDD) fan, and I'd seen the benefit of having a comprehensive test suite while evolving the previous version of the blog engine, so an extensive test suite was a no-brainer. The re-write has both fully-integrated tests, to and fully-isolated unit tests.

Type checking

I chose to enforce type annotations in the Python code with mypy's "strict" mode. Type annotations were only specified in PEP 484 in 2014 and are completely optional in the language. Additionally, how community adoption for annotations will fare in the long-term remains an unknown, so this is a slightly speculative bet on my part.

Building rather than "buying"

In response to not finding tools that meet my Lindy test, my default answer has been "can I build this myself?". Now, arguably this is anti-Lindy, as I'm forgoing older, tested code in favour of non-existent code.

However, when writing the code myself, I am writing only for myself. Pairing TDD with YAGNI, I write only enough code to meet my own requirements, and the code which I write is fully tested and meets my linting standards. I am not attempting to generalise anything for anyone else's use cases. This means fewer lines of code to maintain.

The downsides is I'm getting nothing for free. If I want a sitemap, an RSS feed or search feature, I'm going to have to build it myself. If I'd chosen to use a static site generator, these features would probably have all been built in.

Conclusion

I finished the re-write at the tail end of 2021 and chances are the page you're reading was rendered by that engine. You can see the source code yourself here:

https://github.com/th3james/BlogBuilder

At the time of writing I have no idea if this experiment has succeeded, and only time will tell. The engine works, it feels small but perfectly formed. There are some functionality gaps, and maybe I'll come to resent the effort that would be required to add, for example, pagination to the archive page. I'm intrigued to see how easy it will be to jump to newer versions of Python.

But crucially, I have one less excuse for not writing more blog posts. If all goes well, I'll publish and update here using the same engine in 20 years time.

Fixing common REST mistakes by using HTTP effectively

February 24, 2021

HTTP is a well designed protocol which, used properly, effectively caters to the needs of developers building REST APIs. Unfortunately, despite using HTTP every day, we developers don't always take the time to understand and use what's available to us. Here we'll look at some common REST mistakes and how we can properly use HTTP to fix them.

A quick word about URLs

URL is one of those Three Letter Acronyms we use so often we forget what the words behind the acronym are telling us. Uniform Resource Locator is extremely descriptive - put another way, URLs are the address of a noun (e.g. a blog entry, a collection of products etc). Note that this description does not say anything about verbs, formats, or versions. However, that's not stopped eager developers from cramming those concepts into URLs.

Mistake #1 - Putting actions into URLs

http://example.com/post/12/create-comment

URLs address nouns, but this URL contains a verb: "create"

Solution: Use HTTP Verbs to support multiple operations on nouns

You should prefer having a single URL for each noun and support different operations through different HTTP verbs. In this case, by making the singular comments URL support both GET to list comments and POST to create new ones:

// Get all comments on post 12
GET http://example.com/post/12/comments

// Create a new comment on post 12
POST http://example.com/post/12/comments

Mistake #2 - Putting formats in your URLs

https://example.com/product/blue-robot.html
https://example.com/product/blue-robot.json
https://example.com/product/blue-robot/as_yaml

Again, the issue here is that we have multiple URLs for the same noun, where the only difference is the desired representation. The format of the response (HTML, JSON, YAML) doesn't belong in the URL, it should be determined on a per-request basis.

Solution: Use the HTTP content negotiation headers to choose representation

If you want to support multiple representations of a given resource, do so by handling the Accept HTTP header and returning the Content-Type HTTP header in your response.

// Request asks for text/\*

> GET http://example.com/product/blue-robot
> Accept: text/\*

// Response returns body as HTML and the header:
< Content-Type: text/html

// Request asks for application/json

> GET http://example.com/product/blue-robot
> Accept: application/json

// Response returns body as JSON and header:
< Content-Type: application/json

Mistake #3 - Putting versions in your URLs

Here's another common example of unnecessarily creating multiple URLs for a given resource.

http://example.com/v1/products/red-car
http://example.com/v2/products/red-car

These two URLs are different "versions" of the same resource.

Solution: Use custom HTTP headers to support version negotiation

Unfortunately, this is something that HTTP doesn't specifically address, but luckily there's sufficient flexibility in HTTP for us to RESTfully handle versioning. One strategy is to support a custom Accept-Version header:

// Request asks for v2

> GET http://example.com/products/red-car
> Accept-Version: v2

// Response return v2 representation
< HTTP/1.1 200 OK
< Version: v2

Mistake #4 - Putting statuses in your response body

Breaking from URLs for our last tip, here's another common anti-pattern that fails to use HTTP effectively.

> GET http://example.com/products/red-car
> < HTTP/1.1 200 OK
> < {"status": "error", "message": "No such product"}

> POST http://example.com/posts
> < HTTP/1.1 200 OK
> < {"status": "error", "message": "Unauthorised"}

Both of these scenarios (content missing and authorisation failure) can be described using status codes, but instead the error messages are dumped into the response body.

Solution: Always use the most appropriate HTTP status code, add detail in the body where necessary

Always use the most descriptive HTTP status code. There's nothing to stop you from adding more detail about the error to the body of your response, but start with the correct HTTP status and try to avoid redundancy.

> GET http://example.com/products/red-car
< HTTP/1.1 404 OK
< {"message": "No product with slug 'red-car'"}

> POST http://example.com/posts
< HTTP/1.1 401 OK
< {"message": "Authentication not provided"}

Conclusion

Hopefully this advice helps you to leverage HTTP properly to build better REST interfaces, but also acts as a reminder to take the time to effectively learn the tools you might already think you know because you use them every day.