Stop testing your code!

5Arz...2PCy

1 Mar 2024

Let’s hunt for bugs in a new way!
I posted this article internally within Xendit a bit less than one year ago. It lead to the generalization of what we now call “service-level testing”. Now that some dust settled, I though it was interesting to share it with a wider audience.

Introduction

We sometimes hear that our source code is an important asset. I would rather believe that our code is a liability. Sure enough, what our code does for us is an asset, but if we could get the same outcome with lesser code, that would be for the better!
Truth to be said, it is very uncommon to have customers that will pay you to write code. Except in very rare and specific cases, code is not what brings value to them. Instead, we use code as one of the many, many tools we have to bring value to our customers.
Nowadays, many software developers work in small teams delivering a collection of micro-services (or micro-front-ends as well, of course). Sure enough, the customer will not pay for these either, but at the very least this unit of delivery is the unit that the team is responsible for. Let’s emphasis this:

Your unit of delivery is your service, not your code.

Containerization and micro-services have radically changed the way we develop software. Our testing philosophy, however, has not adapted yet and is often “stuck” on the best practices of the monolith era.
With great power comes great AWS bills

The test-driven ossification

We’ve all seen this in our code: each Stuff class has a TestStuff unit test, and implements a IStuff interface so that it can be easily mocked when OtherStuff needs an instance of Stuff during its own testing.
The result? A code that embraces the inability of our mocking tools to mock classes (so we end up with a lot of nonsensical interfaces that bear no meaning and are only there to allow mocking), tests that spend most of their effort mocking other classes rather than testing their logic, and the worst: tests that enforces a specific implementation by checking that the code called the mocked dependencies at the right time with the right parameters.
Therefore:

You have “double bookkeeping coding”: every time you make one little change, you need to change both the test and the actual code;
You also soon reach the point of“test-driven ossification”, because you can’t refactor your code anymore by fear of having to rewrite the tests as well.

What is the value of all this? When introduced to unit testing, we were promised that the value of these tests would be that we could later change the code and detect the regressions by re-running the tests. But none of this is even remotely true! Every time we change the code, we break a LOT of tests, and these are just indications that the tests were assuming a specific implementation, and hardly ever that we accidentally introduced a regression!
The harsh truth should be told here: all these tests only bring very little value but come with a high price tag:

Low value: because all your dependencies are mocked, your tests don’t prove much except that the code is doing what you implemented in the class. What are your testing exactly? That the compiler works? What exactly is the value of testing a controller class by mocking the service layer?
High price tag: upgrading to a new major version of a library will break all the tests because you now need to re-mock everything from this library. Similarly, refactoring your code will break all the tests. You end up with a bunch of supposedly well-tested code that nobody dares touching anymore.

Your unit tests only prove that your code does what you think it does. It doesn’t prove that your code does what it should be doing.

In monolithic applications, that is often the best we can come up with because setting up a new version of the software after a change is prohibitively expensive. That’s how we ended up with the unit-vs-integration testing dichotomy. But we can do so much better now.

I want better ROI for my tests

In monoliths we write unit tests for classes or functions because that is the only “unit” that we can afford to run on a developer’s asset or in a CI/CD pipeline.
But this is not true at all for micro-services. The unit to be tested is the service, not the code, and we can compile, deploy and start a new build in mere seconds on any mid-range laptop. So why are we not doing this?
Start thinking at the service as the “unit” to be tested (or as the “system under test”). Your service needs to serve an API endpoint that creates a new Stuff? Call that API and check that it returns what you expect. Additionally call another “get” endpoint to check that it go stored properly. Your service receives a Kafka message to update a Stuff in the Database? Send this Kafka message to your service, wait for it to be consumed, and then call the “get” endpoint to check that the Stuff got updated as expected.
As much as possible, treat the database as an implementation detail. In a perfect universe, you would be able to change the underlying DB technology without having to change any of your test. In theory you could write your tests in a different programming language than the rest of your service, it shouldn’t matter.
In other words:

Stop testing your code! Start testing your service instead!

Of course we don’t live in a perfect universe and there will always be some need for “good old” class-level or function-level unit test, and it will always be necessary to mess up with the underlying DB every now and then during the test execution. But if most of your coverage comes from “unit-as-the-service tests”, you end up in a very good place indeed:

You can fearlessly refactor your code: the tests shouldn’t be affected and contrary to the “mock-based unit tests”, they actually prove that the service still does what it is supposed to do because they test what the service does, not how it does it;
You can upgrade dependencies, including your major database version, your runtime OS, your Go runtime, or your Node.JS version or whatnot: because all these are actually part of your testing, you know that if your tests are “green”, it is much less likely to break after deployment;
You then realize that Test-Driven-Development is not this impossible, unrealistic “in a vacuum” theoretical concept: you’ll soon be thinking about the tests you will write at the time you read the PO’s new user story, well before you even start writing your code. And if you feel like it, you could be trying to write the test before the code “for real”.

Objection!

Your honor! There are not enough memes in this blog post!
I got introduced to this micro-service testing approach a few years back. My initial reaction was: “this will never work”. Here are a list of objections I came up with and how it turned out to be in practice:

Tests will be too slow to run

The heavy dependencies are typically already running before the tests start: your DB, Kafka, SQS etc. servers would be running as Docker containers that you started once and for all via docker compose. These are typically not an issue. Start them in the morning and stop then in the evening. During CI/CD, these services tend to start quite quickly from Buddy/Jenkins/whatever, and this needs to be done only once per pipeline execution.
These tests will run a bit slower than “pure” mock-based tests, of course:

You must run the tests in sequence, they can’t be parallelized because they all access the DB etc;
You typically need to start your main HTTP server, Kafka listeners etc. during your test suite setup (i.e. in your beforeEach in Jest). While it should take only a couple of seconds, that is still a couple of seconds more than mock-based tests.

In practice this never felt like an issue: running a single test from the IDE would take 4 to 5 seconds and would tell me that my code actually works “for real”: remember, this single test would test the full call chain from my HTTP stack to my controller to my service to my DB logic etc. This would typically be several tests in a mock-based testing approach. I can wait 4 seconds for that!
When running a full suite, the difference is barely visible because all this initialization is only done once per suite, and not once per test.

It is too much effort

Yes and no. It is much more effort upfront: you need to put in place a proper testing framework in your application from the start. You need to have access to proper testing utilities such as “wait for this Kafka message to be fully consumed”. These utilities should be rolled out in common libraries so you shouldn’t have to re-implement them. As for the rest, it becomes a habit: it is slow at first then becomes part of your muscle memory.
Once you have this upfront effort behind you, however, it requires dramatically less effort: you can focus on your code, change the methods, the class structures or whatnot as many times as you like without breaking your tests. You can easily reproduce field issues by writing a new test (i.e. “the customer called the API with these parameters” kind of scenarios). You can implement new features that touch existing code without fear of breaking something existing. You can easily maintain your dependencies up-to-date, etc. You get to enjoy writing and maintaining code again!

Tests will be fragile and fail at random

No, but yes, but then no.
At first it looks easy because there is no reason why calling an API endpoint would be affected by any race condition: that’s a single straightforward call-chain down to the DB and back, nothing should go wrong.
But then all the asynchronous stuff pops up and you need to write tests dealing with Kafka or SQS (or both!) logic and everything becomes very complex all of a sudden:

How long should I wait until a Kafka message is processed? What should I wait for anyway?
How could I ever test complex logic such as receiving a Kafka message that is forwarded to SQS, only for the SQS message to be consumed and this will trigger a webhook call?
Some DB stuff got added by the previous test and that broke the next test!
I receive some messages that were queued by the previous test and this break the next test!

My typical approach is:

Ensure your environment is a clean as possible before each test starts: Clean the DB by truncating all tables, empty the queues whenever possible (possible for SQS, more difficult for Kafka), etc.
Ensure you know exactly what your service is supposed to do, and test for this. If your service is supposed to receive one Kafka message and produce two SQS messages, don’t stop your test until both SQS messages are received;
Focus on your service’s requirements and interfaces. In the Kafka to SQS relaying example used above, the real test should be: “when I receive a Kafka message, then a webhook should be called”: the SQS “hop” is an implementation detail that your test shouldn’t care about;
Ensure that you have proper cleanup for any resource you allocate: your test suite’s setup logic will start your HTTP server, which may connect to the DB. Make sure you have another function/method to disconnect from the DB that you will call from the test suite’s teardown logic!

You often read that tests should be written with the same care and rigor as for the rest of the code. This couldn’t be more true here.

You can’t run every single dependency on your laptop

Indeed, and docker will only bring you so far. If your service needs to call another downstream service in your infrastructure, then you’ll have no choice but to mock this service. My suggestion is to mock it “as a service”, i.e. start a mock HTTP server during your test initialization. In other words, do not mock your service’s HTTP client stack: you want to test this too!

Does it even work in practice?

Yes it does. I’ve seen it working in Java SpringBoot, in Go, in Node. It is not a theoretical oddity, it is an approach that really work for real code for real customers.

Final thoughts about QA and Integration Tests

The test approach described above will “feel like” integration tests to software developers. But they are not. We are still only testing a single “unit”, the service, in total isolation with all the externalities (the other services) mocked.
To use a similar note as from before:

These new “next-gen” unit tests only prove that your service does what you think it does. It doesn’t prove that your service does what it should be doing.

You will still need time and resources to focus on testing what really matters in the end: whether you have produced something that provides value to your customers. This focus on the end-to-end user journey, by opposition to the technical details of a specific service, class or function, is where you may want a team of dedicated QA specialists.