Stop requiring only one assertion per unit test: Multiple assertions are fine
Assertion Roulette doesn’t mean that multiple assertions are bad.
When I coach teams or individual developers in test-driven development (TDD) or unit testing, I frequently encounter a particular notion: Multiple assertions are bad. A test must have only one assertion.
That idea is rarely helpful.
Let’s examine a realistic code example and subsequently try to understand the origins of the notion.
Outside-in TDD
Consider a REST API that enables you to make and cancel restaurant reservations. First, an HTTP POST
request makes a reservation:
POST /restaurants/1/reservations?sig=epi301tdlc57d0HwLCz[...] HTTP/1.1
Content-Type: application/json
{
"at": "2023-09-22 18:47",
"name": "Teri Bell",
"email": "terrible@example.org",
"quantity": 1
}
HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8
Location: /restaurants/1/reservations/971167d4c79441b78fe70cc702[...]
{
"id": "971167d4c79441b78fe70cc702d3e1f6",
"at": "2023-09-22T18:47:00.0000000",
"email": "terrible@example.org",
"name": "Teri Bell",
"quantity": 1
}
Notice that in proper REST fashion, the response returns the location of the created reservation in the Location
header.
If you change your mind, you can cancel the reservation with a DELETE
request:
DELETE /restaurants/1/reservations/971167d4c79441b78fe70cc702[...] HTTP/1.1
HTTP/1.1 200 OK
Imagine that this is the desired interaction. Using outside-in TDD you write the following test:
[Theory]
[InlineData(884, 18, 47, "c@example.net", "Nick Klimenko", 2)]
[InlineData(902, 18, 50, "emot@example.gov", "Emma Otting", 5)]
public async Task DeleteReservation(
int days, int hours, int minutes,
string email, string name, int quantity)
{
using var api = new LegacyApi();
var at = DateTime.Today.AddDays(days).At(hours, minutes)
.ToIso8601DateTimeString();
var dto = Create.ReservationDto(at, email, name, quantity);
var postResp = await api.PostReservation(dto);
Uri address = FindReservationAddress(postResp);
var deleteResp = await api.CreateClient().DeleteAsync(address);
Assert.True(
deleteResp.IsSuccessStatusCode,
$"Actual status code: {deleteResp.StatusCode}.");
}
This example is in C# using xUnit.net because we need some language and framework to show realistic code. The point of the article, however, applies across languages and frameworks. The code examples in this article are based on the sample code base that accompanies my book Code That Fits in Your Head.
In order to pass this test, you can implement the server-side code like this:
[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public void Delete(int restaurantId, string id)
{
}
While clearly a no-op, this implementation passes all tests. The newly-written test asserts that the HTTP response returns a status code in the 200
(success) range. This is part of the API’s REST protocol, so this response is important. You want to keep this assertion around as a regression test. If the API ever begins to return a status code in the 400
or 500
range, it would be a breaking change.
So far, so good. TDD is an incremental process. One test doesn’t drive a full feature.
Since all tests are passing, you can commit the changes to source control and proceed to the next iteration.
Strengthening the postconditions
You should be able to check that the resource is truly gone by making a GET
request:
GET /restaurants/1/reservations/971167d4c79441b78fe70cc702[...] HTTP/1.1
HTTP/1.1 404 Not Found
This, however, is not the behavior of the current implementation of Delete
, which does nothing. It seems that you’re going to need another test.
Or do you?
One option is to copy the existing test and change the assertion phase to perform the above GET
request to check that the response status is 404
:
[Theory]
[InlineData(884, 18, 47, "c@example.net", "Nick Klimenko", 2)]
[InlineData(902, 18, 50, "emot@example.gov", "Emma Otting", 5)]
public async Task DeleteReservationActuallyDeletes(
int days, int hours, int minutes,
string email, string name, int quantity)
{
using var api = new LegacyApi();
var at = DateTime.Today.AddDays(days).At(hours, minutes)
.ToIso8601DateTimeString();
var dto = Create.ReservationDto(at, email, name, quantity);
var postResp = await api.PostReservation(dto);
Uri address = FindReservationAddress(postResp);
var deleteResp = await api.CreateClient().DeleteAsync(address);
var getResp = await api.CreateClient().GetAsync(address);
Assert.Equal(HttpStatusCode.NotFound, getResp.StatusCode);
}
This does, indeed, prompt you to properly implement the server-side Delete
method.
Is this, however, a good idea? Is the test code easy to maintain?
Test code is code too, and you have to maintain it. Copy and paste is problematic in test code for the same reasons that it can be a problem in production code. If you later have to change something, you have to identify all the places that you have to edit. It’s easy to miss one, which can lead to bugs. This is true for test code as well.
One action, more assertions
Instead of copy-and-pasting the first test, why not instead strengthen the postconditions of the first test case?
Just add the new assertion after the first assertion:
[Theory]
[InlineData(884, 18, 47, "c@example.net", "Nick Klimenko", 2)]
[InlineData(902, 18, 50, "emot@example.gov", "Emma Otting", 5)]
public async Task DeleteReservation(
int days, int hours, int minutes,
string email, string name, int quantity)
{
using var api = new LegacyApi();
var at = DateTime.Today.AddDays(days).At(hours, minutes)
.ToIso8601DateTimeString();
var dto = Create.ReservationDto(at, email, name, quantity);
var postResp = await api.PostReservation(dto);
Uri address = FindReservationAddress(postResp);
var deleteResp = await api.CreateClient().DeleteAsync(address);
Assert.True(
deleteResp.IsSuccessStatusCode,
$"Actual status code: {deleteResp.StatusCode}.");
var getResp = await api.CreateClient().GetAsync(address);
Assert.Equal(HttpStatusCode.NotFound, getResp.StatusCode);
}
This means that you only have a single test method to maintain instead of two duplicated methods that are almost identical.
But, some of the people I’ve coached might say, this test has two assertions!
Indeed. So what? It’s one single test case: Cancelling a reservation.
While cancelling a reservation is a single action, we care about multiple outcomes:
- The status code after a successful
DELETE
request should be in the200
range. - The reservation resource should be gone.
Developing the system further, we might add more behaviors that we care about. Perhaps the system should also send an email about the cancellation. We should assert that as well. It’s still the same test case, though: Successfully cancelling a reservation.
There’s nothing wrong with multiple assertions in a single test. The above example illustrates the benefits. A single test case can have multiple outcomes that should all be verified.
Origins of the single assertion notion
Where does the only one assertion per test notion come from? I don’t know, but I can guess.
The excellent book xUnit Test Patterns describes a test smell named Assertion Roulette. It describes situations where it may be difficult to determine exactly which assertion caused a test failure.
It looks to me as though the only one assertion per test ‘rule’ stems from a misreading of the Assertion Roulette description. (I may even have contributed to that myself. I don’t remember that I have, but to be honest I’ve produced so much content about unit testing over the decades that I don’t want to assume myself free of guilt.)
xUnit Test Patterns describes two causes of Assertion Roulette:
- Eager Test: A single test verifies too much functionality.
- Missing Assertion Message
You have an Eager Test when you’re trying to exercise more than one test case. You may be trying to simulate a ‘session’ where a client performs many steps in order to achieve a goal. As Gerard Meszaros writes regarding the test smell, this is appropriate for manual tests, but rarely for automated tests. It’s not the number of assertions that cause problems, but that the test does too much.
The other cause occurs when the assertions are sufficiently similar that you can’t tell which one failed, and they have no assertion messages.
That’s not the case with the above example. If the Assert.True
assertion fails, the assertion message will tell you:
Actual status code: NotFound.
Expected: True
Actual: False
Likewise, if the Assert.Equal
assertion fails, that too will be clear:
Assert.Equal() Failure
Expected: NotFound
Actual: OK
There’s no ambiguity.
One assertion per test
Now that you understand that multiple assertions per test are fine, you may be inclined to have a ball adding assertions like there’s no tomorrow.
Usually, however, there’s a germ of truth in a persistent notion like the one test, one assertion ‘rule’. Use good judgement.
If you consider what an automated test is, it’s basically a predicate. It’s a statement that we expect a particular outcome. We then compare the actual outcome to the expected outcome to see if they are equal. Thus, in essence, the ideal assertion is this:
Assert.Equal(expected, actual);
I can’t always attain that ideal, but whenever I can, I feel deep satisfaction. Sometimes, expected
and actual
are primitive values like integers or strings, but they might also be complex values that represent the subset of program state that the test cares about. As long as the objects have structural equality, such an assertion is meaningful.
At other times I can’t quite find a way to express the verification step as succinctly as that. If I have to add another assertion or two, I’ll do that.
Conclusion
There’s this notion that you’re only allowed to write one assertion per unit test. It probably originates from real concerns about badly-factored test code, but over the years the nuanced test smell Assertion Roulette has become garbled into a simpler, but less helpful ‘rule’.
That ‘rule’ often gets in the way of maintainable test code. Programmers following the ‘rule’ resort to gratuitous copying and pasting instead of adding another assertion to an existing test.
If adding a relevant assertion to an existing test is the best way forward, don’t let a misunderstood ‘rule’ stop you.
Tags: testing, unit tests
18 Comments
IIRC (but could very well be wrong) initially in test frameworks there were only assertions that threw exceptions. Plus it may also have been the case that assertions didn’t take descriptive comments that were part of any resulting fail message in the log.
The first meant that in any given test run you would get limited information: only the first failed assertion would run, nothing after that. AND IT WAS THE HABIT of programmers to do a lot of setup in individual unit tests (fixture setup notwithstanding) – e.g., first read a record, _assert_ if the read failed, then call some method and assert if _that_ failed, where the _second_ assert was the one you were really interested in.
Then too, without messages and with multiple assertions that pass followed with one that failed you had to pay close attention to the log and the _line number_ listed there for _where_ the failure happened. Especially if you had multiple assertions that “looked the same”, e.g., multiple tests `Assert(result != null)` after a bunch of calls to getters.
So “no multiple assertions in tests” was something designed to force you to use test _names_ – the only way of annotating what a particular test was testing that showed up in the log – to be the _specific pointer_ to what it was you were testing that failed.
Newer (“modern”) frameworks allow both assertions that fail the test but let it keep running (frequently spelled “Expect”) and per-assertion descriptions. Using both of these (plus using proper structuring techniques – e.g., either setup methods in fixtures OR just as good but not often done for some reason by people writing unit tests: just abstracting setup into a method) alleviates the problems noticed by users of those earlier frameworks.
IIRC.
This was also the first thought that came to my mind: even today, most test frameworks do not continue a test past the first failed assertion, and this is especially annoying when you are asserting multiple values:
“`
assert 123 == error_code
assert ‘BADFOO’ == error_name
“`
In situations such as this, I’m in the habit of combining all the information into a tuple in order to get the results of all the comparisons:
“`
assert (123, ‘BADFOO’) == (error_code, error_name)
“`
(BTW, w.r.t. to your last example of distinguishing assertions in the log where you show two _very similar_ messages and then claim, thus “there’s no ambiguity” – why not just add a proper message as an argument to those assertions which explains what your test for `NotFound` or whatever _is actually testing_, semantically, in terms the user (i.e., developer trying to troubleshoot some regression in the build) will actually understand?
_That’s_ where you gain readability/understandability for your tests, that’s where the tests become time-saving supplements to the dev when troubleshooting (instead of frustrating time sinks), and that’s where a _very little extra_ time spent writing the test pays off.
IMO.)
(Wish there was an _edit_ ability for these comments, oh well.)
Better example for the multiple assertions in a test: Call a method passing in an object, on return from the method call _several_ methods and/or getters of the parameter object to ensure (from the outside) it has the correct state. Ignoring whether this is the right way to write O-O code, much less the right way to write the test itself, “one assertion per test” means that _each_ of those tests of state of that same object needs its own test. And, like I said above, I think the reason is that early frameworks threw an exception on every assert – no alternative to that – so if you wrote a test in this way each test run would only show _one_ reason the state was wrong (and the same one each time too, until you fixed it and got the next one)
I think this article is confusing. The problem with this is rule is that it intends to say that each test should verify a single business requirement, which can be thought of like a single business-level assertion. It does mean that physically, depending on your testing framework you will sometimes have more than one assert statements. The rule about single conceptual assertion or single business-level assertion IMHO should stand. This article seems to again conflate assert statements with asserting business requirements and throws out the baby with the bathwater by suggesting universally that always multiple assertions are fine (with a vague caveat about “using judgement”).
The problem of extensive duplication is usually solved by having parametrized tests, not by multiplying unrelated assertions.
Interesting blog post and I like a lot of the messages and links that you share, but I disagree with using GET returning 404 being your source of truth that the entry has been deleted. In my opinion, the unit under test here is the DeleteReservation function, not the combination of `GetReservation` and `DeleteReservation`. To quote from the article – “It’s one single test case: Cancelling a reservation”, but clearly it is actually `Trying to get a reservation after cancelling should fail`. To me, what you have is a `session` (i.e. multiple steps), which is still a very valid test, but I disagree that this is a unit test.
Instead, following black-box API unit testing, I would have one test for `DeleteReservation` that validates that the delete call returns the correct response and then have another test for `GetReservation after deleted` that would setup the data and then call `GetReservation`. The problem with this approach (and black-box unit testing in general) is that tests cannot be fully independent because if `DeleteReservation` is broken, the setup for `GetReservation after deleted` will fail
Alternatively, following white-box API unit testing, I would have one test for `DeleteReservation` that validates that the delete call removes from the DB and returns the correct response. This is a good example of where multiple assertions per unit test are valid, since it is not practically possible to combine validation of DB deletion call with HTTP response. I would recommend using something like https://kotest.io/docs/assertions/soft-assertions.html allows combining multiple assertions into a single assertion. Note that semantically we have a single assertion (DeleteReservation should do what it is expected to do), it is just a compound assertion – `DeleteReservation` should `remove the reservation from the DB and return the correct response`.
Just some thoughts!
I’ve been waiting for someone to say this for awhile.
One assert per unit test? That’s just nonsense.
It’s one of those “rules” followed by people who have nothing else to contribute.
I only apply one test – one assertion rule to unit tests. A test that covers REST calls and relies on the server-side state is not strictly a unit test, the fact that you use xUnit to implement the test doesn’t make it a unit test. What you describe here is integration-level testing; it would be impractical to have the same constraint on integration-level testing.
This post seems to assume that there are only two options: either copy and paste the setup code, or add more assertions following the setup.
There is at least one more option, which seems to me to be both obvious and correct: write some code to abstract out the setup stage. This is not exactly a novel idea; it has in fact been reinvented many times and AFAIK most modern testing frameworks provide support for this, often under the name of “fixtures” – and when such support does not exist, we do expect that working engineers will recognize and factor out duplication rather than using copy/paste code, because this is frankly not rocket science. So if the argument is that multiple assertions are the best way, or even a good way to eliminate duplicated code, I wouldn’t expect that argument to convince any serious programmer.
The post also ignore what seems to me a clear benefit of writing simple tests that test one thing, and that is simply that tests should communicate their purpose, and the single best way we know of to communicate intent is good naming. When I write a test and submit it for review as part of a pull request, I should provide a name for that test that tells me exactly why that test exists and what it means if that test fails. If I write a test with many assertions, then the name of that test cannot serve its purpose.
Obviously, this is important because when a test fails because an engineer makes a change to functionality, it’s important to know whether the assumption made by that test is still valid under the new changes. If I can’t tell what the test is for, I have to stop what I’m doing and waste my time – and my team’s velocity, and my employer’s money – trying to work out why it was written and what to do about it.
> Usually, however, there’s a germ of truth in a persistent notion like the one test, one assertion ‘rule’. Use good judgement.
Yes, this is true. The “germ of truth” is just our old friend the single-responsibility principle: a good function does one thing, and functions that try to do many things are an expression of bad judgement.
You can do what you like, but I expect that as you pursue this approach to testing you’ll find that it’s a great way to test code that is not meant to be maintained, updated, or extended. Which should make you ask yourself, why are you testing this code in the first place?
i agree with you on this. The test names should tell what exactly it is doing and thus 2 assertions will make this complicated.
To reduce assertionRoulette I use soft assertions in Assert4J since it will let the test go on even if there are assertion failures and you can see what all the possible assertions have failed for the test
I find the example you use to make your point fairly confusing. You say you’re testing only a single test case, which is supposedly canceling a reservation; but the test also relies on the POST and GET endpoints to function properly, so the test is also testing two other test cases at the same time. If you truly want to only test deleting a reservation, and test nothing else, you should give your test direct access to the database so it can manually create a reservation without relying on the POST endpoint, then use that database connection to verify that row was deleted without having to rely on the GET endpoint.
But then, at the same time, doing something like that only really makes sense at the unit test level, where you mock out your database connection and you don’t make actual REST calls to test the whole business logic. Mock the business logic to test just the REST endpoint functionality, then test the business logic separately by calling those functions directly (and those tests should mock the database calls.)
So what kind of testing is this supposed to represent? I guess it’s probably closest to integration tests, but nobody ever had a problem testing multiple assertions in a single integration test because that’s kind of the point of an integration test; to test the whole flow, from creating the reservation to getting it to deleting it to attempting to get it again.
Don’t get me wrong, I’m not trying to argue the point being multiple unit test assertions being okay, I wholeheartedly agree “one assertions per test” is a rather unnecessarily restrictive “rule”; I’m just trying to understand the example being made for it.
I think the biggest challenge with too many assertions is that you don’t get feedback on all the reasons why the test fails. Only the first failing assertion the test runner encounters would be reported as a reason for the test failing. If there are 5 assertions, and the failing one is at the top, we lose out on the feedback from the other 4 assertions. If something remains broken for some time, other things may degrade since we lack visibility. In a perfect world, we would of course fix things right away, but in the real world that unfortunately doesn’t always happen.
One way to handle this is to refactor the test code into a method that can then be shared between two separate tests. This eliminates the duplication and still keeps the test code maintainable.
Another option is execute the test inside a before hook and then use separate test blocks to verify the assertions. This way you know exactly what went right and what went wrong. When Assertion Errors are thrown, the remaining assertions still execute. The challenge with this solution though is it makes it look like you have way more tests than you really do. One test with 5 assertions would look like 5 tests.
A third option is to defer throwing the AssertionError until the end of the test case. However, I’ve not come across such a library. The way I picture it working is that all of the failed assertions are simply collected into an array. Once the test case ends, the test framework throws an AssertionException with all of the combined reasons for the failure. So if the first assertion fails, the result is stored until the end of the test case and then the exception is thrown, and it would include all of the reasons for failure, not just the first one.
This article describes a long-standing problem that is practically ubiquitous these days and began long ago. The canonical original form of this is, “Money is the root of all evil.” No. The rule never said that. It’s supposed to be “The *LOVE* of money is the root of all evil.” You catch the distinction, yes? Similarly the venerable “GOTO is EVIL” rule-of-thumb. Well, no. It’s not. That was never the rule. The gratuitous *over-use* of GOTO is bad, but goto by itself is quite handy. Once in awhile it can be a real life-saver. This is simply a more recent manifestation of the same thing. Multiple assertions in the same test? Nothing wrong with that. That was never the issue. Putting 20,000 assertions into a single test block…now *there* is a problem. You can, in principle, encapsulate an entire application into a single function. It will likely be hard to write, hard to read, hard to maintain, hard to debug, and just B.A.D. on a number of levels, but it’s possible. Syntactically, it’s not even incorrect. It’s just *hard to read*. Now here is the crux of the matter. Source code? That stuff you write, and test and fret about and then you give it to a compiler and it makes computers to wondrous things? That stuff? It doesn’t do anything. It never did. The stuff that comes out of the compiler is what makes the computer do things. All of the stuff before that is for *PEOPLE TO UNDERSTAND*. If people can’t understand it, then what little function it actually has is gone. So focus on the actual point. Does what you’re doing interfere with how easy your code is to *understand*? Then it’s not written as well as it could be. It doesn’t matter if you use rules, don’t use rules, employ LINT tools, or whatever.. That’s all that these rules have ever been about. Making code easier to understand. Not dragging coding standards to such an extreme that they themselves become an impediment to comprehension. The same applies to Unit Tests. They’re a means to create little automatic checks for your code that will ensure that things work as intended at any time you choose to check. That’s all they’re for. If it’s easy to see which tests are about which code doing what, then they’re fine. If the tests become worthless because you can’t tell why a test failed or what that implies about the tested code, now you have a problem. How many assertions are in each code block doesn’t even begin to factor into it. It’s the whole “GOTO is EVIL” thing all over again.
You can also avoid assertion roulette by using NUnit’s Assert.Multiple() or AssertionScope within FluentAssertions.
Inspired by NUnit’s Assert.Multiple, I published a Node.js package that allows multiple assertions to be clubbed together so that all of the assertions are executed before throwing an AssertionError. https://www.npmjs.com/package/multi-assert. During the process, I discovered Jasmine does this by default, while Jest and Mocha throw the AssertionError as soon as an assertion fails. I think Jasmine is able to pull it off because their assertion library is more tightly coupled with their core test-runner while Chai is designed to be runner-agnostic. The multi-assert package works with mocha/chai, as well as Jest. I also verified it with WebdriverIO using the mocha framework.