Favor real dependencies for unit testing
If you’ve worked with unit testing, you’ve probably used dependency injection to be able to decouple objects and control their behavior while testing them. You’ve probably injected mocks or stubs into the system under test in order to define repeatable, deterministic unit tests.
Such a test might look like this:
[Fact]
public async Task AcceptWhenInnerManagerAccepts()
{
var r = new Reservation(
DateTime.Now.AddDays(10).Date.AddHours(18),
"x@example.com",
"",
1);
var mgrTD = new Mock<IReservationsManager>();
mgrTD.Setup(mgr => mgr.TrySave(r)).ReturnsAsync(true);
var sut = new RestaurantManager(
TimeSpan.FromHours(18),
TimeSpan.FromHours(21),
mgrTD.Object);
var actual = await sut.Check(r);
Assert.True(actual);
}
(This C# test uses xUnit.net 2.4.1 with Moq 4.14.1.)
Such tests are brittle. They break easily and therefore increase your maintenance burden.
Why internal dependencies are bad
As the above unit test implies, the RestaurantManager
relies on an injected IReservationsManager
dependency. This interface is an internal implementation detail. Think of the entire application as a blue box with two objects as internal components:
An application contains many internal building blocks. The above illustration emphasizes two such components, and how they interact with each other.
What happens if you’d like to refactor the application code? Refactoring often involves changing how internal building blocks interact with each other. For example, you might want to change the IReservationsManager
interface.
When you make a change like that, you’ll break some of the code that relies on the interface. That’s to be expected. Refactoring, after all, involves changing code.
When your tests also rely on internal implementation details, refactoring also breaks the tests. Now, in addition to improving the internal code, you also have to fix all the tests that broke.
Using a dynamic mock library like Moq tends to amplify the problem. You now have to visit all the tests that configure mocks and adjust them to model the new internal interaction.
This kind of friction is likely to deter you from refactoring in the first place. If you know that a warranted refactoring will give you much extra work fixing tests, you may decide that it isn’t worth the trouble. Instead, you leave the production code in a suboptimal state.
Is there a better way?
Functional core
In order to find a better alternative, you must first understand the problem. Why use test doubles (mocks and stubs) in the first place?
Test doubles serve a major purpose: They enable us to write deterministic unit tests.
Unit tests should be deterministic. Running a test multiple times should produce the same outcome each time (ceteris paribus). A test that succeeds on a Wednesday shouldn’t fail on a Saturday.
By using a test double each test can control how a dependency behaves. In Working Effectively with Legacy Code, Michael Feathers likens a test to a vise. It’s a tool to fix a particular behavior in place.
Test doubles, however, aren’t the only way to make tests deterministic.
A better alternative is to make the production code itself deterministic. Imagine, for example, that you need to write code that calculates the volume of a frustum. As long as the frustum doesn’t change, the volume remains the same number. Such a calculation is entirely deterministic.
Write your production code using mostly deterministic operations. For example, instead of the above RestaurantManager
, you can write an immutable class with a method like this:
public bool WillAccept(
DateTime now,
IEnumerable<Reservation> existingReservations,
Reservation candidate)
{
if (existingReservations is null)
throw new ArgumentNullException(nameof(existingReservations));
if (candidate is null)
throw new ArgumentNullException(nameof(candidate));
if (candidate.At < now)
return false;
if (IsOutsideOfOpeningHours(candidate))
return false;
var seating = new Seating(SeatingDuration, candidate.At);
var relevantReservations =
existingReservations.Where(seating.Overlaps);
var availableTables = Allocate(relevantReservations);
return availableTables.Any(t => t.Fits(candidate.Quantity));
}
This example, like all code in this article, is from my book Code That Fits in Your Head. Despite implementing quite complex business logic, it’s a pure function. All the helper methods involved (IsOutsideOfOpeningHours
, Overlaps
, Allocate
, etc.) are also deterministic.
The upshot is that deterministic operations are easy to test. For instance, here’s a parametrized test of the happy path:
[Theory, ClassData(typeof(AcceptTestCases))]
public void Accept(MaitreD sut, DateTime now, IEnumerable<Reservation> reservations)
{
var r = Some.Reservation.WithQuantity(11);
var actual = sut.WillAccept(now, reservations, r);
Assert.True(actual);
}
This code snippet doesn’t show the test case data source (AcceptTestCases
), but it’s a small helper class that produces seven test cases that supply values for sut
, now
, and reservations
.
This test method is typical of unit tests of pure functions:
- Prepare input value(s)
- Call the function
- Compare the expected outcome with the actual value
If you recognize that structure as the Arrange Act Assert pattern, you’re not wrong, but that’s not the main point. What’s worth noticing is that despite non-trivial business logic, no test doubles (i.e. mocks or stubs) are required. This is one of many advantages of pure functions. Since they are already deterministic, you don’t have to introduce artificial seams into the code to enable testing.
Writing most of a code base as deterministic functions is possible, but requires practice. This style of programming is called functional programming (FP), and while it may require effort for object-oriented programmers to shift perspective, it’s quite the game changer—both because of the benefits to testing, and for other reasons.
Even the most idiomatic FP code base, however, must deal with the messy, non-deterministic real world. Where do input values like now
and existingReservations
come from?
Imperative shell
A typical functional architecture tends to resemble the Ports and Adapters architecture. You implement all business and application logic as pure functions and push impure actions to the edge.
At the edge, and only at the edge, you allow impure actions to take place. In the example code that runs through Code That Fits in Your Head, this happens in controllers. For example, this TryCreate
helper method is defined in a ReservationsController
class:
private async Task<ActionResult> TryCreate(
Restaurant restaurant,
Reservation reservation)
{
using var scope =
new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
var reservations = await Repository
.ReadReservations(restaurant.Id, reservation.At)
.ConfigureAwait(false);
var now = Clock.GetCurrentDateTime();
if (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
return NoTables500InternalServerError();
await Repository.Create(restaurant.Id, reservation)
.ConfigureAwait(false);
scope.Complete();
return Reservation201Created(restaurant.Id, reservation);
}
The TryCreate
method makes use of two impure, injected dependencies: Repository
and Clock
.
The Repository
dependency represents the database that stores reservations, while Clock
represents some kind of clock. These dependencies aren’t arbitrary. They’re there to support unit testing of the application’s imperative shell, and they have to be injected dependencies exactly because they’re sources of non-determinism.
It’s easiest to understand why Clock
is a source of non-determinism. Every time you ask what time it is, the answer changes. That’s non-deterministic, because the textbook definition of determinism is that the same input should always produce the same output.
The same definition applies to databases. You can repeat the same database query, and over time receive different outputs because the state of the database changes. By the definition of determinism, that makes a database non-deterministic: The same input may produce varying outputs.
You can still unit test the imperative shell, but you don’t have to use brittle dynamic mock objects. Instead, use Fakes.
Fakes
In the pattern language of xUnit Test Patterns, a fake is a kind of test double that could almost serve as a “real” implementation of an interface. An in-memory “database” is a useful example:
public sealed class FakeDatabase :
ConcurrentDictionary<int, Collection<Reservation>>,
IReservationsRepository
While implementing IReservationsRepository
, this test-specific FakeDatabase
class inherits ConcurrentDictionary<int, Collection<Reservation>>
, which means it can leverage the dictionary base class to add and remove reservations. Here’s the Create
implementation:
public Task Create(int restaurantId, Reservation reservation)
{
AddOrUpdate(
restaurantId,
new Collection<Reservation> { reservation },
(_, rs) => { rs.Add(reservation); return rs; });
return Task.CompletedTask;
}
And here’s the ReadReservations
implementation:
public Task<IReadOnlyCollection<Reservation>> ReadReservations(
int restaurantId,
DateTime min,
DateTime max)
{
return Task.FromResult<IReadOnlyCollection<Reservation>>(
GetOrAdd(restaurantId, new Collection<Reservation>())
.Where(r => min <= r.At && r.At <= max).ToList());
}
The ReadReservations
will return the reservations already added to the repository with the Create
method. Of course, it only works as long as the FakeDatabase
object remains in memory, but that’s sufficient for a unit test:
[Theory]
[InlineData(1049, 19, 00, "juliad@example.net", "Julia Domna", 5)]
[InlineData(1130, 18, 15, "x@example.com", "Xenia Ng", 9)]
[InlineData( 956, 16, 55, "kite@example.edu", null, 2)]
[InlineData( 433, 17, 30, "shli@example.org", "Shanghai Li", 5)]
public async Task PostValidReservationWhenDatabaseIsEmpty(
int days,
int hours,
int minutes,
string email,
string name,
int quantity)
{
var at = DateTime.Now.Date + new TimeSpan(days, hours, minutes, 0);
var db = new FakeDatabase();
var sut = new ReservationsController(
new SystemClock(),
new InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
var dto = new ReservationDto
{
Id = "B50DF5B1-F484-4D99-88F9-1915087AF568",
At = at.ToString("O"),
Email = email,
Name = name,
Quantity = quantity
};
await sut.Post(dto);
var expected = new Reservation(
Guid.Parse(dto.Id),
at,
new Email(email),
new Name(name ?? ""),
quantity);
Assert.Contains(expected, db.Grandfather);
}
This test injects a FakeDatabase
variable called db
and ultimately asserts that db
has the expected state. Since db
stays in scope for the duration of the test, its behavior is deterministic and consistent.
Using a fake is more robust in the face of change. If you wish to refactor code that involves changes to an interface like IReservationsRepository
, the only change you’ll need to make to the test code is to edit the fake implementation to make sure that it still preserves the invariants of the type. That’s one test file you’ll have to maintain, rather than the shotgun surgery necessary when using dynamic mock libraries.
Architectural dependencies
To recap: A functional core needs no dependency injection to support unit testing, because functions are always testable (being deterministic by definition). Only the imperative shell needs dependency injection to support unit testing.
Which dependencies are required? Every source of non-deterministic behavior and side effects. This tends to correspond to the actual, architectural dependencies of the application in question, with the possible addition of a clock and a random number generator.
The sample system in Code That Fits in Your Head has three “real” dependencies: Its database, an SMTP gateway, and the system clock:
Apart from the system clock, these dependencies are components you’d also draw when illustrating the overall architecture of the system. The application is an opaque box, its internal organization implementation details, but its “real” dependencies represent other processes that may run somewhere else on the network.
These dependencies are the ones you may consider to explicitly model. These dependencies you can hide behind interfaces, inject with Constructor Injection, and replace using test doubles. Not mocks, stubs, or spies, but fakes.
Conclusion
Which dependencies should be present in your code base?
Those that represent non-deterministic behavior or side effects. Adding rows to a database. Sending an email. Getting the current time and date. Querying a database.
These tend to correspond to the architectural dependencies of the system in question. If the application requires a database in order to work correctly, you’ll model the database as a polymorphic dependency. If the system must be able to send email, then a messaging gateway interface is warranted.
In addition to such architectural dependencies, system time and random number generators are the other well-known sources of non-determinism, so model those as explicit dependencies as well.
That’s it. Those are the dependencies you need. The rest are implementation details, likely to make your test code more brittle.
The implication is that a typical system will only have a handful of dependencies.
Tags: dependencies, testing
25 Comments
Just stop calling the test “Unit test” and you are fine. Why?
If `ReservationManager` is internal implementaion detail of `RestaurantManager`, why did you write separated the classes?
I guess because each class has different responsibility (in terms of SRP).
Now if you test ReservationManager with real implementation of IReservationManager, you are testing both resposibilities -> not a unit test.
If you have a bug in ReservationManager, test for RestaurantManager will fail too, althouth there is no bug-> not a unit test.
I’m not saying such test are wrong or less useful. They are just not unit tests, IMO.
* a typo in previous comment: Now if you test ~~ReservationManager~~ RestaurationManager with real implementation of IReservationManager, you are testing both resposibilities -> not a unit test.
A Unit is not a Class, for me there is nothing wrong with this approach.
You are right, Cassio, the borders of a Unit are arbitrary. They do not have to be limited to just one class.
But it is practical to do so:
Ideally a failed test should tell me as precisely as possible where the error occured. If my test uses a whole subtree of classes i may need to go on a hunt. If it only tests one class, i already know in which file the breakage occured.
When writing mostly functional i use the kinds of tests proposed in the article too, but they have such a different scope and style that it is sensible to distinguish them. I consider those to be rather integration tests than unit tests, because they test a specific component _and_ it’s interactions with the functional core.
I think it’s important to take a hard look at your day-to-day development practice and look at your tests’ false-positive rate, how often you have to rewrite tests to accommodate a change in your object contract, and how many bugs slip out when you rewrite those tests. Class-specific unit tests make it really easy to get good code coverage numbers, but they don’t give you confidence in refactoring, because any non-trivial refactor will require you to rewrite the tests.
You can say that with class-specific tests, you know exactly what’s broken, except that those tests basically never break. That sounds like a good thing, except that even without those tests, the types of bugs you’re likely to find are not actually the ones most likely to come up. Usually the types of bugs you see are not at the component level, but in the interactions between those components, and the end-to-end flow of information. Those kinds of bugs are best found using higher-level tests, incorporating as many collaborators as you can.
You can call these types of tests “integration tests” if you want, but the important takeaway is how the pyramid breaks down. Most of the best practices from the mid-2010s say that your testing should be primarily made up of class-based unit tests, with a few integration tests to make sure everything’s wired up correctly. And I am arguing that this is an anti-pattern. The class-based tests serve as a checksum on existing behavior, do not model actual requirements, and ensure that fixing actual escaped bugs is made as difficult as possible.
I am reminded of the trend in front-end testing, coming out of the work of Kent C Dodds and the maintainers of the javascript testing library: https://testing-library.com/
“The more your tests resemble the way your software is used, the more confidence they can give you.”
This seems like a solution in search of a problem. If you refactor IReservationsManager, your tests won’t fail; they simply won’t compile. And that’s a Good Thing. As for well-designed components having only a handful of dependencies, that’s been recognized at least since the Law of Demeter was given that name, and probably longer.
I think I’m missing something here. A GitHub sample of the code would help with a commit history showing the before and after. For testing any dB operations I already use InMemory and I’m not sure how you’re supposed to favour real dependencies when the main thing people mock/stub are external dependencies. For me as soon as I use real external dependencies in tests then they become System Integration Tests. It might help me to understand if you could demonstrate a function (without any dependencies) that cannot be tested without resorting to mocking? Or a function with dependencies that can be tested with real dependencies still maintaining the atomic nature of a unit test?
FWIW, the code examples all come from my book Code That Fits in Your Head. The entire code base accompanies the book as a Git repository.
I’m not sure I understand your last two questions. Why would I demonstrate a function without dependencies that can’t be tested without using Test Doubles? I’m not sure if such a function exists – at least, I can’t think of any off the top of my head.
For the other question, if, as I’ve outlined above, you consider the system clock a real dependency (I do), there are impure actions involving time that you can test ‘atomically’. The same goes for some use of randomness.
You could probably also involve the file system and argue that it’d still be a unit test, since the file system tends to be ubiquitous… That one may be a bit of a stretch, though.
LOD is more about not knowing what your dependencies know. The analogy is that a dependency is a “friend”. LOD doesn’t say you can’t have many friends. It just says that you shouldn’t need to know what your friends know (you shouldn’t need to ‘reach through a friend’)
Great article! I rarely use mocking frameworks for the same reason you describe: brittle tests. In your example, `WillAccept` must have to take some dependencies, right?, e.g. something that will tell it opening hours, and something that allocates the reservation.
With these dependencies (deterministic doubles), how is it different from `RestaurantManager` taking a deterministic double on `IReservationsMaanger`?
The class that defines
WillAccept
is calledMaitreD
. It does take configuration data via its constructor, but these are all immutable sealed values that you can’t replace with Test Doubles. This is by design.If you want to see the constructor signature, one place to see it is here: https://blog.ploeh.dk/2020/10/19/monomorphic-functors
This article should be entitled “Throwing away your unit tests and writing integration tests”?
There are 2 kinds of unit tests: sociable and solitary. Sociable uses real dependencies, solitary uses mocks. Some people call sociable unit tests an integration test, and that is a common source of confusion.
https://martinfowler.com/bliki/UnitTest.html
Mocks and injections are for tests that are fundamentally integration tests not unit level tests.
Units of code that rely on external systems need that and cannot be tested at the unit level in reality.
It’s a common source of confusion that “unit” refers to a unit of code, when it really refers to a unit of execution. The criteria that make it a “unit” are that it has no side effects, and has no dependencies on anything outside of its control (like time, network, user input, databases), and therefore can be run reliably in any order, any number of times. While it’s a common best practice that these tests should run quickly, even that isn’t necessarily part of the definition.
There are other considerations as to what makes a test useful, many of which are described here, but the definition of “unit test” is actually pretty broad.
Why? The test named
Accept
(the one with theAcceptTestCases
) is a unit test.The real problem here is that weird “inner manager” class. That to me looks like the thing to test in itself.
Great article! I would even go further and replace `IReservationsManager` with concrete `ReservationsManager` class, until there will be a need to have different implementations (strategies/policies). And that concrete class would carry a title of _collaborator_, than a mere internal dependency.
quote: “This interface is an internal implementation detail.”
I am not sure I follow this. An interface as in IoC, a contract between the class and a caller. The caller is not aware of what kind of implementations it is using. The implementer is not aware of what will be calling it. Interfaces are often reside in a separate project. Therefore they are external to both the class that implements it as well as the caller. Contracts are designed separately from implementation based on the business needs.
Leaving aside the discussion on whether or not interfaces are contracts (they aren’t), whatever you’d like to call them, they describe the interaction between a class and a caller, as you wrote.
How client code interacts with objects is an implementation detail. The users of the software don’t care how that works. Thus, while you can’t completely avoid breaking the internal workings of code when you refactor, the more you keep such breakage to a minimum, the faster you can move.
This is in contrast to how ‘interfaces to the rest of the world’ works. You may, for example, want to change the API to send emails to support attachments. While this could also be a breaking change, the outcome is a change in observable functionality. That’s not an implementation detail.
If you write bad code, you deserve to have your tests break. Let’s pick this apart.
IReservationManager is not under test, so we are correct to mock it, and you’ve overly-constrained it to say “I will only accept a reservation that meets this exact criteria” rather than using `It.IsAny` which would be a shorter setup and just as valid here.
RestaurantManager is under test, but what is the actual logic you’re testing? That restaraunt manager will accept a reservation? But isn’t that the job of the Reservation Manager? This whole setup is just awful. Without the concrete implementations of the classes, it’s hard to tell what you’re trying to do, but it looks like you’re trying to test logic of an inner class by testing the outer class, which is pretty much the cardinal sin of unit testing; you don’t do that.
It might be a useful test for the Restaurant Manager class to say “An error is produced [or handled gracefully] when the Reservation Manager returns false.” Why these are even separate classes, we don’t know, but that’s a valid test, basically the Restaurant Manager class has to do something different, maybe return a list of alternate suggestions, but in that case you still stub the inner class and just use `.Setup()` to control whether it’s going to return true or false.
If you want to know whether a given reservation CAN be accepted, test the Reservation Manager concrete implementation without involving the Restaurant Manager at all.
Thank You. Reading this article confused me so much. The entire thing read like putting the cart before the horse. Bad code tests badly. If the tests are difficult, it a sign you need to redesign your code.
A few people have expressed a similar reaction. This actually surprises me, but also gives me hope for the future.
For the record, I haven’t written unit tests like the initial example for more than a decade. My experience is, however, that most of my customers do.
For the initial example, I was aiming to show a condensed example of such a test. If you’ve never seen such tests, it’s no wonder the example doesn’t resonate with you. Consider yourself fortunate.
Apologies for the lack of clarity.
It seems you could write both test types without changing your code. Write your usual unit tests with all the Mocks you want, and for the “sociable” tests (as someone put it) you could set up an IoC container registered with all of the dependencies using the actual implementations. Simply mock the external dependencies with classes that return specific responses and now you get the best of both worlds. Don’t like how brittle your traditional unit tests are? Re-write them as sociable tests and delete the ones that are brittle. Find the happy balance that works for your project, without rewriting your project or going down a path that fundamentally changes how you write code.