Iterators, iterator blocks and data pipelines
LINQ to Objects is based on iterators. These have always been part of .NET, and
indeed are the basis of the
foreach statement - but with LINQ to Objects
they're more important than ever before. In this article I'll explore what iterators
are about, how to implement them in C# 2 (or 3), and then the fundamental aspects
of LINQ to Objects. This isn't a full tutorial on LINQ to Objects, but it provides
background for the core building blocks (and of course a little bit of LINQ
What's an iterator?
The iterator pattern is a common one, and in .NET it's encapsulated in the
IEnumerator interfaces, and their generic counterparts. (Most of the time
I'll use the nongeneric form for descriptions and the generic form in sample code. Obviously
the generic form is strongly typed where the nongeneric form isn't, but the only other difference
is that the generic form extends
The basic idea is that as a data consumer, you can ask an
IEnumerator with the
GetEnumerator() call, and
then iterate over the contents of the
IEnumerator (using the
Current property) until either you no longer need any more
data, or the iterator runs out information to return. For example, a simple
foreach loop over
IEnumerable<string> looks something
IEnumerator<string> iterator = dataSource.GetEnumerator();
string data = iterator.Current;
The above code doesn't call
Dispose on the iterator, for simplicity:
the code generated for a
foreach loop uses a
finally construct to make sure that the iterator is
disposed when we've finished with it. This is important when the iterator is implemented
by magic of
yield return statements which we'll see later, but it's
not crucial to the basic pattern itself.
Why are there two interfaces involved?
You may well be asking yourself why
IEnumerable doesn't have
Current members itself - why do we need an extra interface? Well,
imagine you had two different loops iterating over the same list. You don't want
the loops to interfere with each other, so you need two independent representations
of the idea of "how much of the list I've looked at so far". This concept is precisely
IEnumerator is for. Typically it keeps a separate piece of information
(such as the current index in the list) as well as referring to the collection itself. Although
iterators are typically used for in-memory collections, they certainly don't have to be, as we'll
see later. The key is the "ask for data, process it, ask for more data" cycle.
Iterator blocks (C# 2)
In C# 1, implementing iterators was a pain. You had to make your collection type implement
IEnumerable, then create a separate type with a reference to an instance of its "parent",
usually adding some more state for how far you'd already gone. It was very, very easy to end
up with subtle off-by-one errors, and generally developers didn't do it unless they really had to.
In C# 2, the compiler does all the hard work for you when you use iterator blocks to implement
IEnumerator (or the generic forms). It builds a state machine
for you, and the iterator obtained effectively executes the code within the iterator block, yielding values
as it goes. It's easier to demonstrate than to explain - here's a complete program:
static IEnumerable<string> GetDemoEnumerable()
yield return "start";
for (int i=0; i < 5; i++)
yield return i.ToString();
yield return "end";
static void Main()
foreach (string x in GetDemoEnumerable())
The output of the program is simple:
I won't go into all the details of how iterator blocks work (see chapter 6 of C# in Depth for more information, or
the language specification for a lot of detail) but it's important to understand the flow:
GetDemoEnumerable() creates a new instance of the extra type generated by the compiler. None of the source code we've written executes yet.
- The iterator executes code until it reaches a
yield statement. In our case, this happens immediately. The iterator
remembers that the current item should be "start" and returns
true to indicate that there is data available.
Main uses the
Current property to retrieve the data, then prints it out.
- The iterator continues execution from the point it had previously reached - in other words, it goes to the line after the first
As before, it executes code (initialising the
i variable) until it reaches the next
- ... The pattern repeats, until there's a call to
MoveNext() which reaches the end of the method - at which point the call returns
to indicate that there's no more data available.
There are three very important (related) points here: first, that none of our original
source code is executed until the first call to
that subsequent calls to
MoveNext() effectively jump back into the source code at the
point they left off before. Yielding a value is sort of like "pausing" the method. Third,
the compiler makes sure that the state of the method (in terms of local variables) is preserved -
even though the
i variable is local inside the iterator block, its value is kept inside the
iterator so that next time
MoveNext() is called, it can still be used.
These features of iterator blocks make them completely different from other methods -
it can be quite tricky to get your head round how they work, until it suddenly clicks. It
can be quite useful to debug into an iterator block to really see how the flow works for yourself.
Now we know about iterators and how they can be easily implemented in C# 2, let's think about how we
can chain them together.
Data pipelines and composability
LINQ to Objects is built on the concept of a data pipeline. We start with a data source which implements
IEnumerable, and chain together lots of operations, each one acting on the result of the
previous one and returning another
IEnumerable - although it could be one with a different type of
result. This act of chaining small operations together to form one large operation is called
The emphasis in composition is of making each of the small operations genuinely simple. Flexibility is provided in
LINQ to Objects by the use of delegates which can act on the input elements - for example, to act as a predicate
in a filter, indicating whether or not a particular value should pass the filter; or as a projection to map one
value to another.
As a silly demonstration of this, suppose we have a sequence of numbers, in some random order. We want a result which
is a sequence of strings - we want to filter out any negative numbers, sort the rest, and then call
to get the numbers in hexadecimal format. In C# 3 and .NET 3.5, we can do this with the following query expression:
from number in numbers
where number >= 0
The parser translates this into a non-query expression before the compiler deals with it any further:
numbers.Where (number => number >= 0)
.OrderBy (number => number)
.Select (number => number.ToString("x"))
=> syntax is for lambda expressions - a simple way of creating delegates (and also expression trees).
Select methods are extension methods provided by LINQ to
Objects to do filtering, ordering and projections respectively. As you can tell, there are lots of different features interacting
here, and I don't intend to go into the details of most of them - we're just concentrating on the behaviour of iterators in this article.
Deferred execution vs immediate execution
Just as when we created our own iterator using iterator blocks, the expression above doesn't execute any significant
code when it's executed - it sets up the pipeline, but won't actually ask for any data. It only starts doing any
actual filtering/ordering/projecting when the return value of
Select is first asked for some data.
This is called deferred execution. Many LINQ to Objects query operators use deferred execution, especially when they're
conceptually building more of a pipeline. Other query operators such as
ToList() use immediate execution - which means they perform their work immediately, and give you a
result with all the appropriate data.
It's important to be aware of what's going on here so you know when operations
will actually take place. For instance, if you specify a projection which might throw the exception, you need to know that
the exception will only occur when the data is being requested: just because the call to
Select doesn't throw an
exception doesn't mean everything is okay!
Streaming vs buffering
There's another concept which is similar to deferred/immediate execution, but subtly different - and that's whether an operator
streams the data or buffers it. An operator which streams data basically only asks for as much data (from the previous
part of the pipeline) as it it needs to give the next result.
Select() is the simplest example of this - it retrieves
the next item, applies the specified delegate to project it to the result, and then yields that result.
is slightly more complicated - it will keep asking for more data until either there is no more data, or until it finds
an element which passes its filter. When it finds a passing element, however, it will yield that value without asking for any
Buffering operators act differently. When they're asked for data, they buffer up all the data from the earlier piece
of the pipeline, and then work out all the results.
Reverse() is the simplest example of this - in order to
return all of the data in the reverse order, you've got to get to the last element before you can yield anything.
Sorting and grouping also buffer data in LINQ to Objects.
LINQ to Objects contains a few generator operators - ones which produce a sequence of data without starting
DefaultIfEmpty is listed as a generator, but in some ways it isn't - it acts on an existing sequence,
albeit one which may be empty. We won't look at that here.
There are three other standard generator operators:
We'll look at possible implementations of each of them, and then work out how we could implement all of them slightly
differently using composition. Of course, in reality all of these are implemented for us already, but it gives nice
practice for understanding iterator blocks and composition.
Empty does exactly what you'd expect it to - it returns an empty sequence, i.e. one where the first call
false. Strangely enough, implementing it isn't quite as straightforward
as we might expect. We don't need to return any values, so we can just leave the method body empty, right? Well, no... we actually
need code like this:
public static IEnumerable<TResult> Empty<TResult> ()
yield break; statement is an "early out" in iterator blocks - it acts like the end of the method, making
false. Why do we need it at all? Well, with an empty method body there's nothing
to tell the compiler that we want an iterator block in the first place! The compiler only treats a method as being implemented
with an iterator block if it encounters
yield return or
yield break within the method body.
This is the only case I can think of where a
yield break directly before the end of a method actually has
Repeat takes two parameters - an element to return, and the number of times to return it. It will
yield the same element however many times you've specified, and then stop. It's trivial to implement (including
checking that the count isn't negative):
public static IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
if (count < 0)
throw new ArgumentOutOfRangeException("count");
for (int i=0; i < count; i++)
yield return element;
A note on error checking: there's a problem here. Ideally, we would like our
Repeat method to throw
an exception immediately we call it, if
count is less than 0. Currently, we'll only throw the exception
when some code tries to iterate through the result. That's generally a bad thing, and it would be nice
if the language
gave us more support here. For the moment though, we'll leave our slightly broken implementation as it is.
Range is also straightforward. Unlike the previous two operators, it's not generic: it always returns an
IEnumerable<int>. You specify two parameters: the starting point, and how many values to return. The returned
sequence starts at the given value, and counts up - so
Range(10, 3) will yield 10, 11, 12. Here's an implementation,
omitting error checking:
public static IEnumerable<int> Range(int start, int count)
for (int i=start; i < start+count; i++)
yield return i;
One generator to rule them all
The thing about the above generators is that they're all a bit specialized. In particular, they take no delegates, unlike
almost everything else in LINQ to Objects! Let's introduce a new generator. It's so general, I reckon it deserves
Generate. It will produce an infinite sequence by taking a starting point, yielding that, and then
applying a delegate to the current value, yielding the current value, and repeating ad infinitum. Here's the code (with
no error checking - we'd normally check that
step wasn't null):
public static IEnumerable<TSource> Generate<TSource>(TSource start,
TSource current = start;
yield return current;
current = step(current);
The code is as simple as we'd hope - but of course, we'd never want to actually iterate over the result of
directly - it would keep going forever. Indeed,
Generate is almost useless without composition. However, by the
cunning use of just a single other operator, we can easily produce the three generators we've already looked at. Rather than
show any infinite sequences here, I'll wait for a moment before actually using
Take is what I think of as a "pipeline" operator - it takes a sequence and returns a sequence, just like
Where do. As well as the "source" sequence, you need to specify how many elements of
the source you want to be returned.
Take simply yields elements from the source until either the source
runs out of data, or the requested number of elements have been returned. So, the implementation (minus error checking) could
look like this:
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source,
foreach (TSource element in source)
yield return element;
Note the use of
yield break here to indicate that we've finished when the counter reaches 0.
(We could have used a simple
break in this particular case, as there's no more code after the loop,
but I thought it would be worth demonstrating a slightly more normal use of
yield break than the one
Empty implementation.) The use of
this in the declaration of the
parameter indicates that
Take is an extension method.
Generate in our toolkit, we can now reimplement
Range using composition:
public static IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
return Generate(element, x => x).Take(count);
public static IEnumerable<TResult> Empty<TResult>()
return Repeat(default(TResult), 0);
public static IEnumerable<int> Range(int start, int count)
return Generate(start, x => x+1).Take(count);
Repeat is just implemented by first creating an infinite sequence which begins with the specified element
and then applies an identity function to it at each step. We then cut that infinite sequence to the
required length by composing it with
Empty sequence is simply the default value
of the required type, repeated zero times. Finally
Range creates another infinite sequence - starting
with the specified value, it just adds one repeatedly - and again we
Take the required number of elements.
It's not particularly useful to actually implement the three generator functions in this manner - but it
does give an interesting insight into composition. The fact that
Generate always creates an infinite
sequence forces you to think about the fact that the values are only generated as they're requested - if
were to try to create all the values first, and then return them in one go, you'd clearly run out of memory quite quickly.
I have a somewhat more complicated example, taken from a blog post
which contains the full source. It's a query expression which is used to generate a visualisation of the Mandelbrot set:
var query = from row in Enumerable.Range(0, ImageHeight)
from col in Enumerable.Range(0, ImageWidth)
let c = new Complex((col * SampleWidth) / ImageWidth + OffsetX,
(row * SampleHeight) / ImageHeight + OffsetY)
select Generate(c, x => x * x + c).TakeWhile(x => x.SquareLength < 4)
.Count() into count
select (byte)(count == MaxIterations ? 0 : (count % 255) + 1);
The use of
Generate here is within the nested query used to find the colour to use for a particular
point. We generate an infinite path of values which jump around the field of complex numbers, stopping
when either the point has a modulus of 2 or more, or we've performed as many iterations as we're interested in.
The full code is downloadable from the blog post, but I hope this gives some indication of how powerful the
composition technique is.
LINQ to Objects is full of sequences, and a bunch of simple operators which combine to form myriad possibilities.
In C# in Depth, I recommend that when you're faced with a complicated
query expression, you look at the sequences from which it's composed. Understanding how iterators work,
streaming/buffering and deferred/immediate execution gives important foundation material to being able to
work with LINQ effectively. Composing queries is fun, and the iterator blocks which first appeared in C# 2 make
iterators reasonably easy to work with. In this article I've given just a few examples, but I'm sure you can
come up with your own. Go ahead, create your own operators - and above all, experiment.
For many of us, LINQ isn't just a new technology, but a new way of approaching problems when writing in C#.
It always takes a while to really get to grips with fresh ideas, so I'd heartily recommend playing with LINQ
before you need to use it professionally. While the other LINQ providers are perhaps "sexier" than LINQ to
Objects, there's a purity, simplicity and immediacy involved when you can just define a collection in memory
and manipulate it to your heart's content.
Likewise iterator blocks are a mystery to many developers. Unless you've already been thinking in sequences,
you may well have never written an iterator block. Hopefully this article has shown you how simple they can
be, along with a couple of warnings where you may have expected slightly different behaviour. Iterator blocks
were already useful in C# 2, but when combined with LINQ they can be truly spectacular, all within tiny
pieces of code. Again - experiment. It'll all be useful in the long run, and if you're anything like me, I
promise you'll have a lot of fun along the way.