C#: IEnumerable, yield return, and lazy evaluation
Let’s talk about one of my favorite .NET features: IEnumerable
.
This interface enables iterating over a collection. In other words, if something is an IEnumerable
, you can mostly think of it like an array or a list. You can use a foreach
statement to loop through it, you can use LINQ to map or reduce it in a hundred different ways, or you can explicitly cast it to an array with .ToArray()
and access elements by index. But there are a few things that make IEnumerable
special—and a few things that make it tricky.
IEnumerable
is the return type from an iterator. An iterator is a method that uses the yield return
keywords. yield return
is different from a normal return
statement because, while it does return a value from the function, it doesn’t “close the book” on that function. The next time a value is expected, the function will continue executing statements after the yield return
until it hits another yield return
(or a yield break
, or the end of the function block). In other words, you could write an iterator that looks like this:
IEnumerable<int> GetOneTwoThree() {
yield return 1;
yield return 2;
yield return 3;
// We could put "yield break;" here but there's no need, the end of the function signals the same thing.
}
When you call GetOneTwoThree()
it will return an IEnumerable<int>
. As mentioned above, there are a few things you can do with this:
var numbers = GetOneTwoThree();
foreach (var number in numbers) {
Console.WriteLine(number);
// Output:
// 1
// 2
// 3
}
var doubledNumbers = numbers.Select(num => num * 2);
foreach (var number in doubledNumbers) {
Console.WriteLine(number);
// Output:
// 2
// 4
// 6
}
var numberArray = numbers.ToArray();
Console.WriteLine(numberArray[0]); // Output: 1
You may notice that we’re iterating over numbers
multiple times. For a simple iterator like the one I’ve written, that’s technically okay to do. Every time we iterate over numbers
, it will start over at the beginning of the iterator method and yield all the same values over again. (You can’t expect this from every iterator; more on that later.)
Lazy evaluation (the opposite of eager evaluation) is when we wait to execute a piece of code until we absolutely, positively have to. When you call GetOneTwoThree()
, you’ll get a return value despite the fact that none of the code in the function has actually been executed yet! To prove it, run the following code in LINQPad.
(If you’re not familiar with LINQPad, you should check it out—it’s a powerful, portable C# playground. But the only thing you need to know about it here is that it provides the magic .Dump()
method, which outputs any value to the Results pane.)
bool didTheCodeRun = false;
void Main() {
var results = RunTheCode();
didTheCodeRun.Dump();
}
IEnumerable<bool> RunTheCode() {
didTheCodeRun = true;
yield return true;
}
The output of running Main()
in the above snippet is false
. That’s right, after we run RunTheCode()
, which explicitly sets didTheCodeRun
to true
, the value of didTheCodeRun
is still false
. None of the code in our iterator runs until we start iterating through the IEnumerable
. It’s lazily evaluated!
This may seem counterintuitive, but in a lot of cases it’s a good thing. What if you never end up iterating through the IEnumerable
at all? Well, that’s a bunch of code the computer didn’t have to execute. Or what if you’re using a LINQ method like .First()
to try to find a specific item in the collection? You may not need to run all the code in the iterator to get the value you’re looking for–and you won’t. Once .First()
finds a value that matches the predicate, it will stop iterating. If you’re working with an IEnumerable
that potentially has thousands of values (or more), you can save a lot of CPU cycles by only iterating as far as you need to.
Of course, it’s all up to you. You can iterate as much or as little as you want. Let’s take a look at some of the ways to do that.
// Here's a variable to track execution of code in an iterator
int lastYielded = -1;
// Here's an iterator for us to play with
IEnumerable<int> GetOneToTen() {
for (var num = 1; num <= 10; num++) {
lastYielded = num;
yield return num;
}
}
void Main() {
var numbers = GetOneToTen();
lastYielded.Dump(); // Output: -1
// This gives us an 'instance' of the iteration
var enumerator = numbers.GetEnumerator();
// This executes the iterator until the first yield return is reached
enumerator.MoveNext();
// This gives us the current (most recently yielded) value of the iterator
enumerator.Current.Dump(); // Output: 1
lastYielded.Dump(); // Output: 1
// This will iterate from 1 to 4, then stop
foreach (var num in numbers) {
if (num >= 4) {
break;
}
}
lastYielded.Dump(); // Output: 4
// This will not execute any code in the iterator.
// LINQ methods are lazily evaluated as well
var numbersTimesTwo = numbers.Select(num => num * 2);
lastYielded.Dump(); // Output: 4
// This will force the entire iterator to run, yielding all values
var arr = numbers.ToArray();
lastYielded.Dump(); // Output: 10
}
It’s important to point out that many iterators are not as simple as the ones we’ve been using here. An iterator could query a database, for example—including the unfortunate possibility that it might alter data, or that iterating through it twice might yield completely different results! Some tools (like ReSharper) will warn you against multiple enumeration for this reason. An iterator is, from one perspective, nothing more than a synchronous method that may not execute its code right away (or at all). And to muddy the waters just a little, not all iterators are synchronous; there’s also an IAsyncEnumerable
interface (you can loop through it with await foreach
).
Most of the time none of this is a problem. Almost all the time you can treat an IEnumerable
like a list. But I’ve learned the hard way not to rely on this—I once used .Select()
to transform a collection while using an external variable to track a small piece of state on each iteration, then got very confused when the external variable wasn’t updated later in the method. In the end, I fixed the problem by forcing the iteration to complete with .ToArray()
. (There are multiple ways to approach something like this, depending on the expected size of the collection and what you’re trying to track.)
Hopefully this overview helps you avoid running into issues when you create or consume IEnumerable
in your own code (which is extremely likely, as they’re everywhere in C#). Here are a couple of rules to remember:
- Try to avoid side effects when writing an iterator method. Anyone who uses the method should be able to treat it as though it synchronously returns an array. If an iterator changes or executes anything outside of itself, the caller may end up confused. This applies to functions passed to LINQ methods as well, since many of them return
IEnumerable
and are lazily evaluated. - You should avoid iterating over the same
IEnumerable
multiple times. If you know you’re going to access every value—for example, if you’re iterating it using aforeach
with nobreak
orreturn
—enumerate it sooner rather than later by coercing theIEnumerable
to a list or array type. Then use that list or array for all future operations.
12 Comments
😐 at least use a language that’s natively lazy, like Haskell, or maybe Python.
Nice intro article. Very clear and sustinct.
Very informative, thanks!
Thank you, great article
Good article! Clear as crystal!
Thanks for the amazing blog post. This cleared up a lot of confusion I had regarding yield return.
Very informative, thanks!
Thank you for the great post! Clear and short.
A question for the readers or the author – what if I want to avoid lazy load with yield for the IEnumerable, because of the side effects you mention (external state variables affected) BUT I am worried about memory consumption? I mean, I ‘d rather not load everything to memory until actually used – that’s why I use lazy load and yield.
Is there an idea on how that could be achieved?
Thanks a lot!
Maybe if you use Enumerate Take and Skip methods it can help.
the code below is not about laziness. you don’t use the iterator of this IEnumerable. Try to use, for instance, foreach couple of times, and you will see many calls to the method.
don’t confuse people
bool didTheCodeRun = false;
void Main() {
var results = RunTheCode();
didTheCodeRun.Dump();
}
IEnumerable RunTheCode() {
didTheCodeRun = true;
yield return true;
}
Overall good blog, but I figured I’d call out a couple of places where the author got sloppy:
1) in the first paragraph, “explicitly cast it to an array with .ToArray()” – a cast usually refers to a compile-time operation which affects the virtual methods invoked on the object. In contrast, ToArray() is a method (not a cast) which enumerates and potentially copies (à la List.ToArray() ) the collection in question, with all the side-effects mentioned in the remainder of the article.
2) in the final code block, “// LINQ methods are lazily evaluated as well” – half true. Some Linq methods may be lazily evaluated (Select, Where, OrderBy?), but others aren’t (Min/Max, Any, First, ToList, ToDictionary). I’ve not verified, but I suspect any Linq method which returns IEnumerable will be lazily evaluated, whereas all others will necessarily evaluate the collection to return a fully initialized object, i.e. you can’t lazily initialize a dictionary from a collection.
These are good points, thanks for the correction.