C#: IEnumerable, yield return, and lazy evaluation

Let’s talk about one of my favorite .NET features: IEnumerable.

This interface enables iterating over a collection. In other words, if something is an IEnumerable, you can mostly think of it like an array or a list. You can use a foreach statement to loop through it, you can use LINQ to map or reduce it in a hundred different ways, or you can explicitly cast it to an array with .ToArray() and access elements by index. But there are a few things that make IEnumerable special—and a few things that make it tricky.

IEnumerable is the return type from an iterator. An iterator is a method that uses the yield return keywords. yield return is different from a normal return statement because, while it does return a value from the function, it doesn’t “close the book” on that function. The next time a value is expected, the function will continue executing statements after the yield return until it hits another yield return (or a yield break, or the end of the function block). In other words, you could write an iterator that looks like this:

IEnumerable<int> GetOneTwoThree() {
  yield return 1;
  yield return 2;
  yield return 3;
  // We could put "yield break;" here but there's no need, the end of the function signals the same thing.
}

When you call GetOneTwoThree() it will return an IEnumerable<int>. As mentioned above, there are a few things you can do with this:

var numbers = GetOneTwoThree();

foreach (var number in numbers) {
  Console.WriteLine(number);
  // Output:
  // 1
  // 2
  // 3
}

var doubledNumbers = numbers.Select(num => num * 2);

foreach (var number in doubledNumbers) {
  Console.WriteLine(number);
  // Output:
  // 2
  // 4
  // 6
}

var numberArray = numbers.ToArray();
Console.WriteLine(numberArray[0]); // Output: 1

You may notice that we’re iterating over numbers multiple times. For a simple iterator like the one I’ve written, that’s technically okay to do. Every time we iterate over numbers, it will start over at the beginning of the iterator method and yield all the same values over again. (You can’t expect this from every iterator; more on that later.)

Lazy evaluation (the opposite of eager evaluation) is when we wait to execute a piece of code until we absolutely, positively have to. When you call GetOneTwoThree(), you’ll get a return value despite the fact that none of the code in the function has actually been executed yet! To prove it, run the following code in LINQPad.

(If you’re not familiar with LINQPad, you should check it out—it’s a powerful, portable C# playground. But the only thing you need to know about it here is that it provides the magic .Dump() method, which outputs any value to the Results pane.)

bool didTheCodeRun = false;

void Main() {
  var results = RunTheCode();
  didTheCodeRun.Dump();
}

IEnumerable<bool> RunTheCode() {
  didTheCodeRun = true;
  yield return true;
}

The output of running Main() in the above snippet is false. That’s right, after we run RunTheCode(), which explicitly sets didTheCodeRun to true, the value of didTheCodeRun is still false. None of the code in our iterator runs until we start iterating through the IEnumerable. It’s lazily evaluated!

This may seem counterintuitive, but in a lot of cases it’s a good thing. What if you never end up iterating through the IEnumerable at all? Well, that’s a bunch of code the computer didn’t have to execute. Or what if you’re using a LINQ method like .First() to try to find a specific item in the collection? You may not need to run all the code in the iterator to get the value you’re looking for–and you won’t. Once .First() finds a value that matches the predicate, it will stop iterating. If you’re working with an IEnumerable that potentially has thousands of values (or more), you can save a lot of CPU cycles by only iterating as far as you need to.

Of course, it’s all up to you. You can iterate as much or as little as you want. Let’s take a look at some of the ways to do that.

// Here's a variable to track execution of code in an iterator
int lastYielded = -1;

// Here's an iterator for us to play with
IEnumerable<int> GetOneToTen() {
  for (var num = 1; num <= 10; num++) {
    lastYielded = num;
    yield return num;
  }
}

void Main() {
  var numbers = GetOneToTen();
  lastYielded.Dump(); // Output: -1

  // This gives us an 'instance' of the iteration
  var enumerator = numbers.GetEnumerator();

  // This executes the iterator until the first yield return is reached
  enumerator.MoveNext();

  // This gives us the current (most recently yielded) value of the iterator
  enumerator.Current.Dump(); // Output: 1
  lastYielded.Dump(); // Output: 1

  // This will iterate from 1 to 4, then stop
  foreach (var num in numbers) {
    if (num >= 4) {
      break;
    }
  }

  lastYielded.Dump(); // Output: 4

  // This will not execute any code in the iterator.
  //  LINQ methods are lazily evaluated as well
  var numbersTimesTwo = numbers.Select(num => num * 2);
  lastYielded.Dump(); // Output: 4

  // This will force the entire iterator to run, yielding all values
  var arr = numbers.ToArray();
  lastYielded.Dump(); // Output: 10
}

It’s important to point out that many iterators are not as simple as the ones we’ve been using here. An iterator could query a database, for example—including the unfortunate possibility that it might alter data, or that iterating through it twice might yield completely different results! Some tools (like ReSharper) will warn you against multiple enumeration for this reason. An iterator is, from one perspective, nothing more than a synchronous method that may not execute its code right away (or at all). And to muddy the waters just a little, not all iterators are synchronous; there’s also an IAsyncEnumerable interface (you can loop through it with await foreach).

Most of the time none of this is a problem. Almost all the time you can treat an IEnumerable like a list. But I’ve learned the hard way not to rely on this—I once used .Select() to transform a collection while using an external variable to track a small piece of state on each iteration, then got very confused when the external variable wasn’t updated later in the method. In the end, I fixed the problem by forcing the iteration to complete with .ToArray(). (There are multiple ways to approach something like this, depending on the expected size of the collection and what you’re trying to track.)

Hopefully this overview helps you avoid running into issues when you create or consume IEnumerable in your own code (which is extremely likely, as they’re everywhere in C#). Here are a couple of rules to remember:

Try to avoid side effects when writing an iterator method. Anyone who uses the method should be able to treat it as though it synchronously returns an array. If an iterator changes or executes anything outside of itself, the caller may end up confused. This applies to functions passed to LINQ methods as well, since many of them return IEnumerable and are lazily evaluated.
You should avoid iterating over the same IEnumerable multiple times. If you know you’re going to access every value—for example, if you’re iterating it using a foreach with no break or return—enumerate it sooner rather than later by coercing the IEnumerable to a list or array type. Then use that list or array for all future operations.

Add to the discussion