Don't Mock What You Don't Own - a moderately-contrived story

July 27, 2022

A piece of automated testing guidance I find people often struggle with is Don't Mock What You Don't Own. The basic idea is that you should never use mocking to replace a third-party interface in your codebase.

Instead, you should define your own wrapper APIs around third-party interfaces, the design of which is driven by your application's requirements, rather than being dictated by the shape of the third-party interface.

These wrappers can then be safely mocked because you control the interface. This leads to a "hexagonal" architecture with low coupling.

The problem is this sounds like vague academic nonsense, which requires extra code to be written with the only immediate benefit being that your system's internals are easier to test. It's hard to blame most engineers (who consider themselves finely-tuned bullshit detectors) for concluding that the Test Driven Development (TDD) acolytes have lost their minds in pursuit of a clean test suite.

In this post I'm going to give a concrete example of how this approach affects a system's design and demonstrate how it helps a system adapt to change. Hopefully by the end you'll see why this pattern is worth considering even if you're not practising TDD.

A worked example with third-party mocking

Imagine we're building a web application which displays data fetched from an imaginary third-party git hosting provider named "repo-host.com".

Our application has two end-points, repo_view and other_view, both of which query the repo-host.com API and render a template using the retrieved data.

def repo_view(repo_id):
    response = http.get(f"repo-host.com/api/{repo_id}")
    repo = response.as_json()
    return render_template("""
        <h1>{repo['name']}</h1>...
        <span>{repo['commits'].length}</span>...
    """, {repo: repo})

def other_view(repo_id):
    response = http.get(f"repo-host.com/api/{repo_id}")
    repo = response.as_json()
    return render_template("""
        ...<span>{repo['commits'].length}</span>...
    """, {repo: repo})

A URL change

One day we receive a message from repo-host.com developer outreach telling us that unforeseen architectural challenges force a URL change. repo-host.com/api/:repo_id will soon be repo-host.com/api/v1/:repo_id. You grumble something about non-RESTful URLs but agree to update your application.

In doing so, you notice that you've got some repetition in your code, which means you must update two lines which reference the URL in question. "Don't Repeat Yourself!" you cry, as you refactor the common code into a get_repo function.

# Refactor common fetch code into a function
def get_repo(repo_id):
    response = http.get(f"repo-host.com/api/v1/{repo_id}")
    return response.as_json()

# Views are now "DRY"
def repo_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        <h1>{repo['name']}</h1>...
        <span>{repo['commits'].length}</span>..
    """, {repo: repo})

def other_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        ...<span>{repo['commits'].length}</span>...
    """, {repo: repo})

The response body changes

repo-host.com have realised that returning a list of all commits with every request is expensive and are applying rate limiting to /api/v1. To continue querying at the rate we need, they encourage us to use the rate-unlimited simple=true parameter, which instead returns a commit_count attribute.

As you come to make this change, you realise this is a bit annoying, as you have two templates expecting a commits array attribute, which both need to be updated.

def get_repo(repo_id):
    # UPDATED
    response = http.get(f"repo-host.com/api/v1/{repo_id}?simple=true")
    return response.as_json()

def repo_view(repo_id):
    repo = get_repo(repo_id)
    # UPDATED
    return render_template("""
        <h1>{repo['name']}</h1>...
        ...<span>{repo['commit_count']}</span>...
    """, {repo: repo})

def other_view(repo_id):
    repo = get_repo(repo_id)
    # UPDATED
    return render_template("""
        ...<span>{repo['commit_count']}</span>...
    """, {repo: repo})

Still, not the end of the world. You update your code and test mocks, run your CI and deploy.

Runtime exceptions in production

As soon as your release hits production, you start seeing runtime exceptions. Turns out, someone else on the team had seen your helpful get_repo function and integrated it into their own view.

# Someone else's view
def repo_commits_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        {% for commit in repo['commits'] %}
        {% commit['sha'] %} {% commit['message'] %}...
        {% end %}  
    """, {repo: repo})

This code depended on the commits array attribute in the return value which we just replaced with commit_count. Why didn't the CI catch this?! Were there no tests?!

There were tests, but unfortunately they mocked the request object, which isn't owned by our system:

def test_repo_commits_view():
    with mock.patch("http.get") as fake_get:
        fake_get.return_value = mock.Mock(as_json={
            "commits": [...]
        })
        test_client.get("/repo/commits")
        ...

You roll the deploy back, and set to work adding a "simple" boolean parameter to your get_repo function. But before you can finish, the phone starts ringing again.

"Yeah, so HTTP isn't working out for us"

repo-host.com have decided that gRPC is the wave of the future, and are deprecating their HTTP REST API. Your heart sinks as you realise exactly how many parts of the system are calling get_repo and therefore expect a JSON key-value object, which you'll now need to re-write to match the gRPC values.

What went wrong?

This is, admittedly, a highly contrived example. I hope most third parties would provide a more stable API than repo-host.com.

Additionally, many of the problems here wouldn't have made it to production with extensive integrated testing. However, integrated tests have their own set of problems, and one of the goals of TDD is to arrive at designs which can be verified with as few integrated tests as possible.

The common root of our issues is that we've allowed our design to be affected by what's available rather than what we need. We've invited a data structure we don't own (the repo-host.com REST API response body) deep into our application, to the point where even our template layer's code is informed by it. The whole of our system is now "coupled" to this structure, and as soon as it changes, we have to change the entire system with it.

Take 2

This is why "Don't mock what you don't own" and Discovery Testing put a focus on describing the dependent layers of your system in terms of the interfaces you want to exist, rather than being guided by what's available.

Combined with YAGNI, you end up with smaller interfaces to third-party services which are tightly coupled to your domain model, and loosely coupled to the third-party.

Let's rewind to the start of our system, and imagine the sort of design I'd expect to arrive at following those design principles.

We start with our views, imagining the get_repo function we want to exist.

def repo_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
      <h1>{repo.name}</h1>...
      <span>{repo.commit_count}</span>..
    """, {repo: repo})

def other_view(repo_id):
    repo = get_repo(repo_id)
    return render_template("""
        ...<span>{repo.commit_count}</span>...
    """, {repo: repo})

Because we're focussed on what we want, not what's available, the return value from our imaginary get_repo function is a simple object rather than a JSON dictionary, and we only reference commit_count, rather than doing commits.length on an array we otherwise don't use.

Now, we implement our imagined get_repo function, mapping the third-party interface into the first-party interface we just designed.

def get_repo(repo_id):
    response = http.get(f"repo-host.com/api/{repo_id}")
    return Repo.build_from_response(
        response.as_json()
    )

# This class models only attributes we need
class Repo:
    name: str
    commit_count: int

    @classmethod
    def build_from_response(cls, response_body):
        return cls(
            name=response_body['name'],
            commit_count=response_body['commits'].length,
        )

The first thing you'll notice is: this is a lot more lines of code! This is a valid concern - more lines means it takes longer to write and creates more space for bugs to hide in.

So what's the upside?

All translation between repo-host.com and our internal system are now encapsulated by the get_repo function. We "own" the entirety of the get_repo interface, including the return type, meaning, according to "Don't mock what you don't own", this function is now fair game for mocking.

As such, our view tests can look like this:

def test_repo_view():
    with mock.patch("get_repo") as fake_get_repo:
        fake_get_repo.return_value = Repo(name="hello", commit_count=0)
        ...

If we wanted to mock out our previous get_repo implementation we had to specify an arbitrary JSON object as our return value. With our new implementation, we can specify an instance of our new Repo type.

This smaller, simpler return type makes for an easier to read test. Additionally, because Repo is a concrete class, we can be confident that our return value has the same fields as those in the production system.

As for testing the get_repo function itself, as this calls the third-party repo-host.com interface over HTTP, we cannot safely mock its internals and must rely on integration testing. In these situations I would typically reach for a tool like VCR.py. Thankfully, because the responsibilities of get_repo are very specific and limited, we should not need many integrated tests to have sufficient confidence.

Replaying the changes

So, what happens as we work through those same sets of required changes to the system?

First up, changing the URL to add /v1

def get_repo(repo_id):
    # One line change - update the URL
    response = http.get(f"repo-host.com/api/v1/{repo_id}")

Building get_repo_commits_view

Next, a step we didn't see happen with the first design - someone else building a repo_commits_view. Previously, they saw our get_repo function, observed that the commits attribute was an array and built their view around it.

With our new implementation, there's no opportunity for that accidental coupling, as our 1st-party Repo class only models the attributes we use: name and commit_count.

While the temptation still exists be guided by what's available and extend get_repo to add the commits to the Repo class, let's again imagine the interface we want to exist.

def repo_commits_view(repo_id):
    # we don't get to `get_repo`, we want to...
    commits = get_repo_commits(repo_id)
    return render_template("""
        {% for commit in commits %}
            {% commit.sha %} {% commit.message %}...
        {% end %}
    """, {repo: repo})

Now, we must code our imagined get_repo_commits function into existence:

def get_repo_commits(repo_id):
    # Duplicated code
    response_body = http.get(f"repo-host.com/api/{repo_id}").as_json()
    return [
        Commit.build_from_response(res) for res in response_body['commits']
    ]

class Commit:
    message: str
    hash: str

    @classmethod
    def build_from_response(cls, response_body):
        return cls(
            name=response_body['message'],
            hash=response_body['sha']
        )

Again, we end up with a function (get_repo_commits) where we own the entire interface, and therefore can mock it safely when testing other functions. The function itself must be integration tested as it communicates with a third-party.

Don't Repeat Yourself again

In building this, we copy-and-pasted the repo-host.com HTTP GET call from get_repo. Once our integrated tests our passing, we decide to refactor the common http.get code into a shared repo_host_fetch_repo function:

def get_repo(repo_id):
    response_body = repo_host_fetch_repo(repo_id)
    ...
    
def get_repo_commits(repo_id):
    response_body = repo_host_fetch_repo(repo_id)
    ...

def repo_host_fetch_repo(repo_id):
    return http.get(f"repo-host.com/api/{repo_id}").as_json()

We're pretty happy that we've eliminated the duplication. We might be tempted to re-write our integrated get_repo and get_repo_commits tests as isolated tests which mock out repo_host_fetch_repo, but since the return type is raw JSON from a third-party, it fails the "Don't mock what you don't own test". As such, the integrated tests stay.

Rate limiting

Now let's introduce the rate-limiting change which makes fetching commits expensive. This time our refactor is safer, as it's clear from the code which functions depend on commits. As an example, let's do the most naive thing possible and just add simple=true to repo_host_fetch_repo.

def repo_host_fetch_repo(repo_id):
    return http.get(f"repo-host.com/api/v1/{repo_id}?simple=true").as_json()

Immediately, our integrated tests for get_repo_commits start failing, as simple=true means repo_host_fetch_repo no-longer returns the commits array. This reveals that our optimistic, duplication-reducing refactor was folly. We decide to unroll our repo_host_fetch_repo function.

Here are the complete required changes:

def get_repo(repo_id):
    # unroll `repo_host_fetch_repo`,
    # specify `simple=true`, we don't need commits
    response_body = http.get(f"repo-host.com/api/v1/{repo_id}?simple=true").as_json()
    ...

class Repo:
    ...
    def build_from_response(cls, response_body):
        return cls( ... 
            # Use `commit_count` rather than `['commits'].length`
            commit_count=response_body['commit_count'],
        )

def get_repo_commits(repo_id):
    # unroll `repo_host_fetch_repo`,
    # don't specify `simple=true`, we need commits
    response_body = http.get(f"repo-host.com/api/v1/{repo_id}").as_json()

That's it - three lines modified and no need to update any tests. The rest of the system is isolated from the change and requires no updates. Despite only writing integrated tests for get_repo and get_repo_commits, the test were sufficient to catch integration mistakes and prevented us from shipping broken code to production. Had we not done our premature duplication-reducing refactor, we wouldn't have needed to touch get_repo_commits either.

HTTP -> gRPC

Switching from integrating with HTTP to gRPC sounds like a pretty big change, but when you're practising "Don't mock what you don't own", and designing interfaces based on your domain model's needs, it's actually not that crazy.

What would we have to change? Well, our integrated tests for our boundary functions get_repo and get_repo_commits would have to be completely re-written, and we'd have to make them pass. But that might be it. The rest of your system is written in terms of what we want, and once we've translated at third-party boundary into these first-party representations, there may well be no reason for the rest of the system to change.

Was it worth it?

The gains here may seem small, but they scale as your application grows.

I'm not here to pretend that creating bespoke wrappers for your third-party interfaces isn't more work up-front. In the early stages of your application you are certain to write more code, much of which may seem like unnecessary boiler-plate. Do not follow this pattern if you're building something in a 48 hour hack weekend.

However, like many TDD practices, this pattern helps you write code with low coupling, which at its core means modifications to your application require fewer parts of your system to change.

This means your delivery cadence is more likely to be stable, without dramatic spikes for unexpected new requirements. Upgrading to newer versions of libraries is easier, so you can use the latest versions of tools and apply critical security updates quickly. Because your system is easier to adapt, you can say "Yes" to big changes ("gRPC? No problem") or pounce upon opportunities (What if we cached the whole of repo-host.com in a database?) that a more tightly coupled system might preclude.

And lastly, it lays out a framework for deciding where to use integrated tests (which are expensive to write and maintain) while providing safe spaces to use isolated unit tests with mocks without worrying about compromising your test suite's ability to catch errors.

Analog Moment

by James Cox-Morton (@th3james)