Testing gives us confidence that our app continues to work as we make changes. In an ideal world our tests are fast, reliable and exercise the app as a whole. This becomes increasingly difficult once our app starts including more external dependencies like a databases and vendor services. e.g. How do we test an SQL query? How do we test our calls to the Netflix API?

The testing pyramid¹ provides some guidance on how to balance these competing desires. It encourages most of our logic to be unit testable, but in some cases we can reach for the higher level integration or end to end tests, such as those involving external dependencies.

In the context of Hexagonal architecture², we can apply the testing pyramid across the layers. The Domain and Application layers contain all the business logic and are isolated, so should be unit testable. The Infrastructure layer is typically more difficult to test as it is coupled with the external dependencies - so we may need to stand up a those dependencies in order to test, e.g. run a PostgreSQL instance to test the SQL queries against.

Testing the Domain and Application

We can achieve high coverage in these areas as they are isolated from the hard-to-test infrastructure dependencies through service interfaces. With this in mind we should try to shift the complexity of our app to these layers.

An opinion on unit tests and mocks

The strict definition of unit tests requires that all dependencies be isolated or in other words mocked. The primary issue with mocks is they are fragile³, as they attempt to replicate the real thing.

Consider a dependency called NumberValidator where the real implementation initally only returns true for positive numbers. When testing, we choose to mock it out, forcing it to return true when given a positive number. At some point, the implementation for NumberValidator is changed to only return true for negative numbers. In most cases, since the interface has not changed, our test with the mock continues to pass and the contributor who changed the implementation moves on without realising there is now an inconsistency. This is an overly simplified example, but hopefully it demonstrates that over time we introduce small inconsistencies between our mocks and the real implementations.

If mocking everything is discouraged, when should we use mocks? I am in the camp that we should mock the things that would make our test slow and less reliable, primarily those external dependencies, e.g. system clock, random, databases. These dependencies are exactly what Hexagonal architecture isolates through the service interfaces in the Application layer. Instead of thinking of each method or class as a unit to be tested, we can think at the Application level, where the only dependencies are those service interfaces.

Testing the Infrastructure

This is where the rubber meats the road. We will have have implemented the service interfaces defined in the Application layer and wired up a REST API that will run our business logic. There could be invalid SQL in the PostgreSQL service; or we could be authenticating incorrectly to an external API; or we could have misconfigured the DNS when deploying our API - how do we capture these bugs before going to production?

Validating interfaces

For validating that our service interfaces are correct there is nothing better than having integration tests against the real thing⁴. Setting up the real thing can be expensive, so we often settle for the next best option as long as it gives us confidence.

For example, if the real thing is the Stripe API, then it may be sufficient to settle for the Stripe sandbox API. In cases like databases, it can be simple to spin up and seed a PostgreSQL or ElasticSearch instance. In all these cases, we need to keep in mind this is not what we are running in production, so may exhibit different performance characteristics.

Testing the whole thing

There is immense confidence when we see successful tests that behave like a customer of our app, but there is also great pain when they fail and we need to debug the problem by digging through error logs to find the root cause 😭. These end-to-end tests will execute code paths through all the layers in our Hexagonal architecture, often needing to hit multiple services, each representing a point of failure and contributing to less reliable and slower tests.

At the end of the day, we need these tests. There are portions of our code that are not unit or integration testable. Think about changes to deployment configuration, DNS, infrastructure as code and upgrades to our web frameworks and libraries - these sometimes innocent changes are the most common cause of outages.

The rule of thumb I subscribe to for these tests is to treat them more like smoke tests⁵. They are a small subset of tests that run to provide early warning around the critical flows of our service. Critical flows are the ones that, if not caught, will be considered an outage and cause panic. Instead we want these tests to fail and we rollback and "keep calm and carry on"⁶. Even better, these tests can be integrated into an automated release strategy where live traffic is only directed through when the tests pass (e.g. blue-green⁷).

Beyond testing

Quality is not 100% test coverage or having all the test types recommended in the test pyramid. It is about having practices around continually improving the confidence and stability in our application. It is perfectly fine to start off with low coverage, but when a bug occurs, we should have the tools in place to prevent it from happening again, e.g. by writing a test at some level, preferring unit tests.