Iterator block implementation details: auto-generated state machines
Introduction
Iterators have been in .NET since its first release, but C# 2 made them easy to implement
using iterator blocks. In this article I'll go into details of how the Microsoft
C# compiler converts iterator blocks into state machines. If you aren't sure about the
IEnumerable
and IEnumerator
interfaces (and their generic counterparts)
or the basics of iterator blocks, it would be worth reading
an article giving more details about
them (or chapter
6 of C# in Depth, which is a free sample chapter), before continuing with this article.
Pretty much all of this article is implementation-specific. The Mono compiler may approach things in a slightly different way, for instance. It's likely to be very similar though; when in doubt as to what you can rely on, consult the language specification. (Iterator blocks are covered in section 10.14 of the C# 3.0 spec.
High level overview: what's the pattern?
This article originally came about because a reader asked about the difference between using iterator
blocks for methods returning IEnumerator
and those returning IEnumerable
.
In addition, there's the choice between the nongeneric interfaces and the generic ones. We'll
start off by using the same iterator block for each of the four possibilities, and look at the
differences in the generated code. Throughout this article I'll present the C# equivalent of the code
that the compiler produces. Obviously the compiler doesn't actually produce C#, but I've
used Reflector to decompile the code as C#.
Our first sample will just yield the numbers 0 to 9 in sequence. Initially we'll declare the
method to return the nongeneric IEnumerator
. Here's the complete code:
using System;
using System.Collections;
class Test
{
static IEnumerator GetCounter()
{
for (int count = 0; count < 10; count++)
{
yield return count;
}
}
}
Simple, right? Well, let's see what that turns into after compilation. Hold your breath...
internal class Test
{
// Note how this doesn't execute any of our original code
private static IEnumerator GetCounter()
{
return new <GetCounter>d__0(0);
}
// Nested type automatically created by the compiler to implement the iterator
[CompilerGenerated]
private sealed class <GetCounter>d__0 : IEnumerator<object>, IEnumerator, IDisposable
{
// Fields: there'll always be a "state" and "current", but the "count"
// comes from the local variable in our iterator block.
private int <>1__state;
private object <>2__current;
public int <count>5__1;
[DebuggerHidden]
public <GetCounter>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
}
// Almost all of the real work happens here
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<count>5__1 = 0;
while (this.<count>5__1 < 10)
{
this.<>2__current = this.<count>5__1;
this.<>1__state = 1;
return true;
Label_004B:
this.<>1__state = -1;
this.<count>5__1++;
}
break;
case 1:
goto Label_004B;
}
return false;
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
void IDisposable.Dispose()
{
}
object IEnumerator<object>.Current
{
[DebuggerHidden]
get
{
return this.<>2__current;
}
}
object IEnumerator.Current
{
[DebuggerHidden]
get
{
return this.<>2__current;
}
}
}
}
We'll start off with a reasonably high level look at what's going on, explaining each member. The real
work is done in MoveNext()
so we'll skim over that at this point, and go deeper into
just that method when we're happy with the rest of the code. Looking at the code from top to bottom,
here's a running commentary:
- The source shown above isn't quite valid C# - it uses names which are illegal within normal C#. The use of illegal names in generated code is quite common; it prevents any possibility of clashes with normal names, as well as stopping the manually written C# from explicitly using the generated code.
-
Calling
GetCounter()
just calls the constructor of<GetCounter>d__0
which is the type implementing the iterator. The constructor in turn just sets the state of the iterator. None of the code in our original class has been executed yet. -
Various elements of the code are decorated with the
CompilerGenerated
andDebuggerHidden
attributes.CompilerGenerated
is almost irrelevant - the MSDN documentation for it mentions that it allows SQL Server extra access, but 99.9% of developers really don't need to know that. It's nice to see it there for the sake of clarity, however.DebuggerHidden
just stops debuggers (well, those that choose to take note of the attribute) from stepping into or breaking in the code. From now on I won't bother showing these attributes. -
<GetCounter>d__0
(hereafter known as just "the iterator type") implements three interfaces:IEnumerator<object>
,IEnumerator
andIDisposable
. In fact, the first interface implies the other two anyway, but I've left them all in the list to make things clearer. It's interesting that the generic interface is implemented even though our originalGetCounter()
method used the nongeneric form. This is probably just for consistency - we'll see that very little changes when we tell the compiler to use the generic version. -
It's very important that
IDisposable
is implemented (aside from being necessary in order to implementIEnumerator<object>
). Theforeach
statement in C# callsDispose
on any iterator which implementsIDisposable
; the call is in afinally
block just as with theusing
statement. In the very rare cases where you use an iterator manually instead of with aforeach
loop, you should use ausing
statement to make sure thatDispose
is called. We'll see why later on. - The iterator type is nested, which means it has access to all the private members of the enclosing class. That's important if your iterator block calls other private methods or accesses private variables (or other private members, of course). We'll see an example of this later on.
-
The iterator type has three fields - two private and one public. The two private variables keep track of the
value to return from the
Current
property and what state the iterator is in. We'll be looking at these in a lot more detail later. To be honest, I don't know why the variable associated with the count is public. When we look at what happens if the original iterator block method takes a parameter, we'll see public fields in use - although a different design decision could have kept everything private. -
The constructor just sets the state of the iterator. In this case, the only code calling the constructor
is the
GetCounter()
method which always passes in 0 as the initial state, but we'll see in a minute that in other situations there's a genuine reason to pass the state in as a parameter. -
MoveNext()
is basically just a big switch statement. It always is, and the value it's switching on is a state in the state machine, so it knows which bit of code to execute next. We'll spend a lot of time looking at this in a bit. -
The
Reset
method fromIEnumerator
always throwsSystem.NotSupportedException
. This isn't just an implementation decision - it's mandated by the C# specification. -
In this first simple example, the
Dispose
method does nothing. We'll look at this as a separate topic. -
Both the
IEnumerator
andIEnumerator<object>
implementations ofCurrent
simply return the value of the<>2__current
variable. The C# language specification explicitly leaves the behaviour of theCurrent
property undefined in "odd" cases (such as accessing it before the first call toMoveNext()
or whenMoveNext()
has returnedfalse
).
Generic vs nongeneric interfaces
We've seen how declaring a method to return IEnumerator
results in an iterator which implements
IEnumerator<object>
as well. Now let's change the code just slightly, to make the method
explicitly return IEnumerator<int>
:
using System;
using System.Collections.Generic;
class Test
{
static IEnumerator<int> GetCounter()
{
for (int count = 0; count < 10; count++)
{
yield return count;
}
}
}
I won't post the whole of the resulting code, but here are the changes. Firstly, and most obviously,
the GetCounter()
method has changed return type, but nothing else:
private static IEnumerator<int> GetCounter()
{
return new <GetCounter>d__0(0);
}
Likewise the iterator now implements IEnumerator<int>
instead of IEnumerator<object>
.
The type involved here is called the yield type of the iterator. Every yield return
statement has
to return something which can be implicitly converted to the yield type. As we've seen, when a nongeneric interface is
used the yield type is object
. Here's the new class signature:
private sealed class <GetCounter>d__0 : IEnumerator<int>, IEnumerator, IDisposable
Similarly the Current
property implementing the generic interface and the backing variable are
changed to int
:
private int <>2__current;
int IEnumerator<int>.Current
{
get
{
return this.<>2__current;
}
}
Other than those minor changes, the class looks the same as it did before.
Returning IEnumerable
There are more significant changes if we change the original code to return IEnumerable
or its
generic equivalent instead of IEnumerator
. We'll change the code to return IEnumerable<int>
and stick with the generic interfaces from now on, as we've seen they make very little difference. So, here's
the source:
using System;
using System.Collections.Generic;
class Test
{
static IEnumerable<int> GetCounter()
{
for (int count = 0; count < 10; count++)
{
yield return count;
}
}
}
... and the resulting code (with attributes stripped out for brevity):
internal class Test
{
private static IEnumerable<int> GetCounter()
{
return new <GetCounter>d__0(-2);
}
private sealed class <GetCounter>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
// Fields
private int <>1__state;
private int <>2__current;
private int <>l__initialThreadId;
public int <count>5__1;
public <GetCounter>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId;
}
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<count>5__1 = 0;
while (this.<count>5__1 < 10)
{
this.<>2__current = this.<count>5__1;
this.<>1__state = 1;
return true;
Label_0046:
this.<>1__state = -1;
this.<count>5__1++;
}
break;
case 1:
goto Label_0046;
}
return false;
}
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) && (this.<>1__state == -2))
{
this.<>1__state = 0;
return this;
}
return new Test.<GetCounter>d__0(0);
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<Int32>) this).GetEnumerator();
}
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
void IDisposable.Dispose()
{
}
int IEnumerator<int>.Current
{
get
{
return this.<>2__current;
}
}
object IEnumerator.Current
{
get
{
return this.<>2__current;
}
}
}
}
I've shown the whole code again so that we can easily see the differences:
-
Obviously, the iterator type now implements
IEnumerable<int>
but it also still implementsIEnumerator<int>
. It's very odd for something to be iterable and an iterator at the same time. It's an optimisation for a common case, as we'll see in a minute. -
The implementation of
IEnumerator<int>
remains exactly the same -Reset
stil throws an exception,Current
still just returns the current value, andMoveNext()
has the same logic in. - The method creating the iterator instance pass the constructor an initial state of -2 instead of 0.
-
We have an extra private variable,
<>l__initialThreadId
, which is set in the constructor to reflect the thread which created the instance. -
GetEnumerator()
either sets the state to 0 and returnsthis
or it creates a new instance of the iterator which starts in state 0.
So, what's going on? Well, the most common use case (by far) is that an instance of IEnumerable<T>
is created, then something (like a foreach
statement) calls GetEnumerator()
from the same
thread, iterates through the data, and disposes of the IEnumerator<T>
at the end. The original
IEnumerable<T>
is never used after the initial call to IEnumerator<T>
. Given
the prevalence of that pattern, it makes sense for the C# compiler to choose a pattern which optimises towards that
case. When that's the behaviour, we only create a single object even though we're using it to implement two
different instances. The state of -2 is used to represent "GetEnumerator()
hasn't been called yet"
whereas 0 is used to represent "I'm ready to start iterating, although MoveNext()
hasn't been called yet".
However, if you either try to call GetEnumerator()
either from a different thread, or when it's not
in a state of -2, the code has to create a new instance in order to keep track of the different states. In the latter case
you've basically got two independent counters, so they need independent data storage. GetEnumerator()
deals with initializing the new iterator, and then returns it ready for action. The thread safety aspect is there
to prevent two separate threads from independently calling GetEnumerator()
at the same time,
and both ending up with the same iterator (i.e. this
).
That's the basic pattern when it comes to implementing IEnumerable<T>
: the compiler implements all
the interfaces in the same class, and the code lazily creates extra iterators when it has to. We'll see that there's
more work to do when parameters are involved, but the basic principal is the same.
Choosing between interfaces to return
Normally, IEnumerable<T>
is the most flexible interface to return. If your iterator block
doesn't change anything, and your class isn't implementing IEnumerable<T>
itself (in which
case you'd have to return an IEnumerator<T>
from your GetEnumerator()
method,
of course), it's a good choice. It allows clients to use foreach
, iterate several times, use LINQ to
Objects and general goodness. It's definitely worth using the generic interfaces instead of the nongeneric ones.
From here on I'll only refer to the nongeneric interfaces in the text, but each time I'll mean both forms. (In
other words, there's an important distinction between IEnumerable
and IEnumerator
, but
from this point on I won't distinguish between IEnumerable
and IEnumerable<T>
).
State management
There are up to x pieces of state that the iterator type needs to keep track of:
- Its "virtual instruction pointer" (i.e. where it's got to)
- Local variables
- Parameter initial values and
this
- The creating thread (as shown above, and only in the
IEnumerable
case; I won't cover this further) - The last yielded value (i.e.
Current
; this is trivial enough to not require separate attention)
We'll look at each of the first three in turn.
Keeping track of where we've got to
The first piece of state in our state machine is the one which keeps track of how much code has executed
from our original source. If you think of a normal state machine diagram (with circles and lines) this is
which circle we're currently in. In many cases it's just referred to as the state - and indeed in our
sample decompiled output so far we've seen it as <>1__state
. (This is unfortunate
as all of the rest of the data is state too, but never mind...) The specification refers to
the states of before, running, suspended and after, but as we'll see
suspended needs more detail - and we need an extra state for IEnumerable
implementations.
Before I go any further, it's worth remembering that an iterator block doesn't just run from start to finish.
When the method is originally called, the iterator is just created. It's only when MoveNext()
is called
(after a call to GetEnumerator()
if we're using IEnumerable
). At that point, execution
starts at the top of the method as normal, and progresses as far as the first yield return
or yield break
statement, or the end of the method. At that point, a Boolean value is returned to indicate whether or not
the block has finished iterating. If/when MoveNext()
is called again, the method continues executing
from just after the yield return
statement. (If the previous call finished for any other reason,
we've finished iterating and nothing will happen.) Without looking at the generated code, let's write a small
program to step through a simple iterator. Here's the code:
using System;
using System.Collections.Generic;
class Test
{
static readonly string Padding = new string(' ', 30);
static IEnumerator<int> GetNumbers()
{
Console.WriteLine(Padding + "First line of GetNumbers()");
Console.WriteLine(Padding + "Just before yield return 0");
yield return 10;
Console.WriteLine(Padding + "Just after yield return 0");
Console.WriteLine(Padding + "Just before yield return 1");
yield return 20;
Console.WriteLine(Padding + "Just after yield return 1");
}
static void Main()
{
Console.WriteLine("Calling GetNumbers()");
IEnumerator<int> iterator = GetNumbers();
Console.WriteLine("Calling MoveNext()...");
bool more = iterator.MoveNext();
Console.WriteLine("Result={0}; Current={1}", more, iterator.Current);
Console.WriteLine("Calling MoveNext() again...");
more = iterator.MoveNext();
Console.WriteLine("Result={0}; Current={1}", more, iterator.Current);
Console.WriteLine("Calling MoveNext() again...");
more = iterator.MoveNext();
Console.WriteLine("Result={0} (stopping)", more);
}
}
I've included some padding for the output created in the iterator block to make the results clearer. The lines on the left are in the calling code; the lines on the right are in the iterator block:
Calling GetNumbers() Calling MoveNext()... First line of GetNumbers() Just before yield return 0 Result=True; Current=10 Calling MoveNext() again... Just after yield return 0 Just before yield return 1 Result=True; Current=20 Calling MoveNext() again... Just after yield return 1 Result=False (stopping)
Now let's introduce the values that <>1__state
can take on, and their meanings:
- -2: (
IEnumerable
only) Before the first call toGetEnumerator()
from the creating thread - -1: "Running" - the iterator is currently executing code; also used for "After" - the iterator has finished,
either by reaching the end of the method or by hitting
yield break
- 0: "Before" -
MoveNext()
hasn't been called yet -
Anything positive: indicates where to resume from; it's yielded at least one value, and there's possibly more to come.
Positive states are also used when code is still running but within a
try
block with a correspondingfinally
block. We'll see why later.
It's interesting to note that the generated code doesn't distinguish between "running" and "after". There's really no reason
why it should: if you call MoveNext()
when the iterator's in that state (which may be due to it running in a
different thread) then MoveNext()
will just immediately return false
. This state is also the one
we end up in after an uncaught exception.
Now that we know what the states are for, let's look at what MoveNext()
looks like for the above iterator.
It's basically a switch
statement that starts execution at a particular place in the code based on the state.
That's always the case for MoveNext()
, with the one exception of an iterator body which consists
solely of a yield break
.
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
Console.WriteLine(Test.Padding + "First line of GetNumbers()");
Console.WriteLine(Test.Padding + "Just before yield return 0");
this.<>2__current = 10;
this.<>1__state = 1;
return true;
case 1:
this.<>1__state = -1;
Console.WriteLine(Test.Padding + "Just after yield return 0");
Console.WriteLine(Test.Padding + "Just before yield return 1");
this.<>2__current = 20;
this.<>1__state = 2;
return true;
case 2:
this.<>1__state = -1;
Console.WriteLine(Test.Padding + "Just after yield return 1");
break;
}
return false;
}
This is a very simple case, of course - we just start off in state 0, then 1, then 2, then -1 (although we're briefly in -1 while the code executes each time, of course). Let's take a look at our first example again. Here's the original code:
using System;
using System.Collections;
class Test
{
static IEnumerator GetCounter()
{
for (int count = 0; count < 10; count++)
{
yield return count;
}
}
}
and here's the MoveNext()
method, just as we saw before:
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<count>5__1 = 0;
while (this.<count>5__1 < 10)
{
this.<>2__current = this.<count>5__1;
this.<>1__state = 1;
return true;
Label_004B:
this.<>1__state = -1;
this.<count>5__1++;
}
break;
case 1:
goto Label_004B;
}
return false;
}
Yes, it's using a goto
statement to leap from one case to half-way through another. Ouch. Don't
forget that this is only generated code though. When you think about it, for
loops and
while
loops are really just nice wrappers round comparisons and jumps. We don't really care
how nasty this code is in terms of readability, so long as it works and performs well. The simplest way
for the C# compiler team to achieve those two goals was to model it all with switch
and
goto
.
I don't intend to explain how all the different transformations take place. I'll look at the handling of
finally
blocks later on, and it's interesting to note that you can't yield (either to return or break) from a
try
block which has a catch
block, or from any catch
or finally
block. Importantly, you can yield from a try
block which only has a finally
block.
That means you can still use using
statements, which can be very handy indeed. It can be quite
interesting to experiment with different ways of making the code branch and loop, and then see what happens
to the generated MoveNext()
method. However, I couldn't do that exhaustively whereas you can
experiment very easily. The simple examples above show the principle, along with the states involved. Let's move
on to the next piece of state.
Local variables
Normal local variables are very simple in iterator blocks. They become instance variables in the
iterator type, and are assigned meaningful values in the same way (and at the same point) that they'd be initialized
in normal code. Of course, being instance variables there's no longer any meaningful idea of them being definitely
assigned, but the normal compilation rules will prevent you from seeing their default values (unless you deliberately
mess with the state described above, using reflection). All of this can be seen in the earlier example with the
<count>5__1
variable.
The nature of local variables means that creating an iterator instance doesn't require any extra information about the variables themselves - any initial value will be set within the course of the code. That initial value may rely on non-local variables of course, which brings me to the final type of state.
Parameters and this
Methods implemented with iterator blocks can take parameters, and if they're instance methods they can use this
as well. Any reference to an instance variable of the type containing the iterator block is effectively just using this
and then navigating from that reference to the variable. Here's an example containing both a method parameter (max
and a reference to an instance variable min
- I've qualified the instance variable with this
just to make it clear.
using System;
using System.Collections.Generic;
class Test
{
int min;
public Test(int min)
{
this.min = min;
}
public IEnumerator<int> GetCounter(int max)
{
for (int count = this.min; count < max; count++)
{
yield return count;
}
}
}
Note that it's returning an IEnumerator
rather than an IEnumerable
. This makes more of a difference
when using parameters and this
than it does otherwise, as we'll see soon. Here are the interesting bits of the
generated code:
internal class Test
{
private int min;
public Test(int min)
{
this.min = min;
}
public IEnumerator<int> GetCounter(int max)
{
<GetCounter>d__0 d__ = new <GetCounter>d__0(0);
d__.<>4__this = this;
d__.max = max;
return d__;
}
private sealed class <GetCounter>d__0 : IEnumerator<int>, IEnumerator, IDisposable
{
private int <>1__state;
private int <>2__current;
public Test <>4__this;
public int <count>5__1;
public int max;
public <GetCounter>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
}
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<count>5__1 = this.<>4__this.min;
while (this.<count>5__1 < this.max)
{
this.<>2__current = this.<count>5__1;
this.<>1__state = 1;
return true;
Label_0050:
this.<>1__state = -1;
this.<count>5__1++;
}
break;
case 1:
goto Label_0050;
}
return false;
}
// Other methods as normal
}
}
We've gained two extra fields in the iterator. One is just called max
, and the other is <>4__this
.
Where the original code accesses min
, the generated code accesses <>4__this.min
- which it can
do despite Test.min
being private due to the fact that it's in a nested type.
The interesting and (to my mind) somewhat counterintuitive part is the way these extra fields are initialised. Personally,
I would have added them as constructor parameters, making both of them private and <>4__this
readonly too.
Eric Lippert, from the C# team, has explained that the code which is responsible for this is the same code which hoists
captured variables from closures - and those really do need to be public so that the original method can still get
at them. So basically it's a code reuse issue rather than there being a sneaky reason why it couldn't have been done my
preferred way. There's no real harm here, but I find this sort of thing fascinating :)
As it happens, our iterator block doesn't change the value of max
- but it could do. Now suppose that instead of
IEnumerator
we were to return IEnumerable
. Given that we want each of the iterators generated by
calls to GetEnumerator()
to use the original value of max
, how does the compiler keep things in check?
Here's the interesting subset of generated code (the source is the same as it was before, just with a change to the return type).
internal class Test
{
private int min;
public IEnumerable<int> GetCounter(int max)
{
<GetCounter>d__0 d__ = new <GetCounter>d__0(-2);
d__.<>4__this = this;
d__.<>3__max = max;
return d__;
}
private sealed class <GetCounter>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
private int <>1__state;
private int <>2__current;
public int <>3__max;
public Test <>4__this;
private int <>l__initialThreadId;
public int <count>5__1;
public int max;
public <GetCounter>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId;
}
private bool MoveNext()
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<count>5__1 = this.<>4__this.min;
while (this.<count>5__1 < this.max)
{
this.<>2__current = this.<count>5__1;
this.<>1__state = 1;
return true;
Label_0050:
this.<>1__state = -1;
this.<count>5__1++;
}
break;
case 1:
goto Label_0050;
}
return false;
}
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
Test.<GetCounter>d__0 d__;
if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) && (this.<>1__state == -2))
{
this.<>1__state = 0;
d__ = this;
}
else
{
d__ = new Test.<GetCounter>d__0(0);
d__.<>4__this = this.<>4__this;
}
d__.max = this.<>3__max;
return d__;
}
}
}
Now we've got another extra field (<>3__max
) to represent the original value of the
max
parameter. Every time we start using an instance as an IEnumerator
(i.e. the state
becomes 0, whether it's on construction or not) we initialise the max
field with that value.
Note that we don't need to have an extra field for <>4__this
because this
is
readonly for reference types, so it can't possibly be changed by the original code. (Iterator blocks which occur
in structs and which reference this
do have an extra field for the initial value of this
,
as that could be changed by the iterator block code.)
Arguably the compiler could check that nothing changes the parameter's value and avoid the copying, but I suspect that
would be more effort than it's worth, with odd corner cases. The current system allows you to change parameter values
in weird and wonderful ways without causing problems for any other iterators created from the same IEnumerable
instance.
That wraps up all the state we need to keep track of, but there's one more issue we need to deal with: finally
blocks.
And finally
...
Iterators pose an awkward problem. Instead of the whole method executing before the stack frame is popped, execution effectively
pauses each time a value is yielded. There's no way of guaranteeing that the caller will ever use the iterator again, in any way,
shape or form. If you require some more code to be executed at some point after the value is yielded, you're in trouble: you can't
guarantee it will happen. To cut to the chase, code in a finally
block which would normally be executed in almost all
circumstances before leaving the method can't be relied on quite as much.
It's worth remembering that most finally
blocks in code aren't written explicitly in C# - they're generated by the
compiler as part of lock
and using
statements. lock
is particularly dangerous in iterator
blocks - any time you've got a yield return
statement inside a lock
block, you've got a threading issue waiting
to happen. Your code will keep hold of the lock even when it has yielded the next value - and who knows how long it will be before
the client calls MoveNext()
or Dispose()
? Likewise any try/finally
blocks which are used
for critical matters such as security shouldn't appear in iterator blocks: the client can deliberately prevent the finally
block from executing if they don't need any more values.
The state machine is built so that finally
blocks are executed when an iterator is used properly, however. That's
because IEnumerator<T>
implements IDisposable
, and the C# foreach
loop calls Dispose
on iterators (even the nongeneric IEnumerator
ones, if they implement IDisposable
). The IDisposable
implementation in the generated iterator works out which finally
blocks are relevant to the current position (based on the state,
as always) and execute the appropriate code.
Rather than giving another unrealistic example, this time I'll show you one of my favourite uses for an iterator block. Such a simple
class, but really handy. It lets you iterate over a text file (or any other TextReader
) and closes the reader when either the iterator
finishes or it's disposed. There's a slightly more fully-functional version in MiscUtil
but only in terms of the ways in which you can construct the instance.
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
public sealed class LineReader : IEnumerable<string>
{
readonly Func<TextReader> dataSource;
public LineReader(string filename)
: this(() => File.OpenText(filename))
{
}
public LineReader(Func<TextReader> dataSource)
{
this.dataSource = dataSource;
}
public IEnumerator<string> GetEnumerator()
{
using (TextReader reader = dataSource())
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
With LineReader
in place, dumping a text file line by line is as easy as:
foreach (string line in new LineReader(filename))
{
Console.WriteLine(line);
}
Furthermore, you can use the LINQ Standard Query Operators to filter, project and so on.
However, it's important to close the file as soon as we've finished iterating - whether that's
because we've got to the end of the file, thrown an exception, or just decided we've read enough.
Here are the interesting bits of the generated iterator class for LineReader
:
private sealed class <GetEnumerator>d__3 : IEnumerator<string>, IEnumerator, IDisposable
{
private int <>1__state;
private string <>2__current;
public LineReader <>4__this;
public string <line>5__5;
public TextReader <reader>5__4;
private void <>m__Finally6()
{
this.<>1__state = -1;
if (this.<reader>5__4 != null)
{
this.<reader>5__4.Dispose();
}
}
private bool MoveNext()
{
try
{
switch (this.<>1__state)
{
case 0:
this.<>1__state = -1;
this.<reader>5__4 = this.<>4__this.dataSource();
this.<>1__state = 1;
while ((this.<line>5__5 = this.<reader>5__4.ReadLine()) != null)
{
this.<>2__current = this.<line>5__5;
this.<>1__state = 2;
return true;
Label_0061:
this.<>1__state = 1;
}
this.<>m__Finally6();
break;
case 2:
goto Label_0061;
}
return false;
}
// Note "fault" not "finally"
fault
{
this.System.IDisposable.Dispose();
}
}
void IDisposable.Dispose()
{
switch (this.<>1__state)
{
case 1:
case 2:
break;
// Very strange! Reflector bug? See below.
default:
break;
try
{
}
finally
{
this.<>m__Finally6();
}
break;
}
}
// Other stuff: Reset, Current, constructor etc.
}
There are quite a few interesting points to note here:
-
We've got an extra method,
<>m__Finally6
, which is called from bothDispose()
andMoveNext()
. It contains the code from thefinally
block (from the original source code'susing
statement) and sets the state to "running/after". The compiler keeps track of whichfinally
blocks need to be called if the iterator is disposed in any particular state, and calls them from theDispose()
method. -
There's extra protection for the case where the iterator block itself throws an exception. In this case, the
iterator becomes is disposed of and effectively becomes invalid. This is handled by the
fault
block in the generated code. Just like with the "unspeakable" generated names, this is something which can't be done in real C#, but it's afault
handler in IL. This is likefinally
but isn't triggered when the correspondingtry
block is entered normally. You can think of it as being a bit like a universalcatch
block which rethrows the exception at the end. -
There's a positive state which is used even while we're running. After a little experimentation, it looks like
an extra state is generated for each
try
block containing ayield return
statement. I'm not entirely sure of the purpose of these extra states, but presumably it makes it easier to do the right thing for any particular circumstance. Ayield break
statement results inDispose()
being executed before the method returns. -
The
try/finally
block inDispose
is decidedly odd. I suspect this is a bug in reflector, and that thebreak
should actually be in theblock
. It's unclear to me why it doesn't just make the call directly, but there's probably a good reason.
Again, there's more to be discovered if you experiment, but you can see that the compiler does its best to make sure that
the finally
blocks from the original iterator block are executed as faithfully as possible.
Conclusion
Phew! This article has been much longer than I expected. There are lots of different possibilities and quirks to be accounted for in iterator blocks, and I'm glad that it's up to the C# team to get them all right rather than me. As you can tell, Reflector is a fabulous aid when looking at what's going on under the hood, but you need to be aware that the C# compiler will generate code which has no direct comparable C#, such as the fault handler used in the last example.
The difference between writing iterators by hand and writing them using iterator blocks is absolutely enormous. A lot of the functionality of LINQ to Objects is quite easy to achieve using iterator blocks, although admittedly checking arguments at the appropriate time is tricky. Writing the same functionality by hand would be extremely painful and error-prone. I don't think it's much of an exaggeration to say that products like LINQBridge probably wouldn't exist without iterator blocks; I'm certain that much of the iterator-related code in MiscUtil wouldn't.
So a big "thank you" to the C# team, and I look forward to seeing what other tedious tasks they can eliminate for me.