C# Language Specifications
Iterators, iterator blocks and data pipelines
Introduction
    LINQ to Objects is based on iterators. These have always been part of .NET, and
    indeed are the basis of the foreach statement - but with LINQ to Objects
    they're more important than ever before. In this article I'll explore what iterators
    are about, how to implement them in C# 2 (or 3), and then the fundamental aspects
    of LINQ to Objects. This isn't a full tutorial on LINQ to Objects, but it provides
    background for the core building blocks (and of course a little bit of LINQ
    itself).
What's an iterator?
    The iterator pattern is a common one, and in .NET it's encapsulated in the IEnumerable
    and IEnumerator interfaces, and their generic counterparts. (Most of the time
    I'll use the nongeneric form for descriptions and the generic form in sample code. Obviously
    the generic form is strongly typed where the nongeneric form isn't, but the only other difference
    is that the generic form extends IDisposable.)
    The basic idea is that as a data consumer, you can ask an IEnumerable
    for an IEnumerator with the GetEnumerator() call, and
    then iterate over the contents of the IEnumerator (using the MoveNext()
    method and Current property) until either you no longer need any more
    data, or the iterator runs out information to return. For example, a simple 
        foreach
     loop over IEnumerable<string> looks something
    like this:
IEnumerator<string> iterator = dataSource.GetEnumerator();
while (iterator.MoveNext())
{
    string data = iterator.Current;
    // Use data within the body of the loop
}
    The above code doesn't call Dispose on the iterator, for simplicity:
    the code generated for a foreach loop uses a
    try/finally construct to make sure that the iterator is
    disposed when we've finished with it. This is important when the iterator is implemented
    by magic of yield return statements which we'll see later, but it's
    not crucial to the basic pattern itself.
Why are there two interfaces involved?
    You may well be asking yourself why IEnumerable doesn't have MoveNext()
    and Current members itself - why do we need an extra interface? Well,
    imagine you had two different loops iterating over the same list. You don't want
    the loops to interfere with each other, so you need two independent representations
    of the idea of "how much of the list I've looked at so far". This concept is precisely
    what IEnumerator is for. Typically it keeps a separate piece of information
    (such as the current index in the list) as well as referring to the collection itself. Although
    iterators are typically used for in-memory collections, they certainly don't have to be, as we'll
    see later. The key is the "ask for data, process it, ask for more data" cycle.
Iterator blocks (C# 2)
    In C# 1, implementing iterators was a pain. You had to make your collection type implement
    IEnumerable, then create a separate type with a reference to an instance of its "parent",
    usually adding some more state for how far you'd already gone. It was very, very easy to end
    up with subtle off-by-one errors, and generally developers didn't do it unless they really had to.
    In C# 2, the compiler does all the hard work for you when you use iterator blocks to implement
    either IEnumerable or IEnumerator (or the generic forms). It builds a state machine
    for you, and the iterator obtained effectively executes the code within the iterator block, yielding values
    as it goes. It's easier to demonstrate than to explain - here's a complete program:
using System;
using System.Collections.Generic;
class IteratorBlockDemo
{
    static IEnumerable<string> GetDemoEnumerable()
    {
        yield return "start";
        
        for (int i=0; i < 5; i++)
        {
            yield return i.ToString();
        }
        
        yield return "end";
    }
    
    static void Main()
    {
        foreach (string x in GetDemoEnumerable())
        {
            Console.WriteLine(x);
        }
    }
}
The output of the program is simple:
start 0 1 2 3 4 end
I won't go into all the details of how iterator blocks work (see chapter 6 of C# in Depth for more information, or the language specification for a lot of detail) but it's important to understand the flow:
MaincallsGetDemoEnumerable()GetDemoEnumerable()creates a new instance of the extra type generated by the compiler. None of the source code we've written executes yet.MaincallsMoveNext()- 
        The iterator executes code until it reaches a 
yieldstatement. In our case, this happens immediately. The iterator remembers that the current item should be "start" and returnstrueto indicate that there is data available. Mainuses theCurrentproperty to retrieve the data, then prints it out.MaincallsMoveNext()again- 
        The iterator continues execution from the point it had previously reached - in other words, it goes to the line after the first 
yield return. As before, it executes code (initialising theivariable) until it reaches the nextyieldstatement. - 
        ... The pattern repeats, until there's a call to 
MoveNext()which reaches the end of the method - at which point the call returnsfalseto indicate that there's no more data available. 
    There are three very important (related) points here: first, that none of our original
    source code is executed until the first call to MoveNext(). Second,
    that subsequent calls to MoveNext() effectively jump back into the source code at the
    point they left off before. Yielding a value is sort of like "pausing" the method. Third,
    the compiler makes sure that the state of the method (in terms of local variables) is preserved -
    even though the i variable is local inside the iterator block, its value is kept inside the
    iterator so that next time MoveNext() is called, it can still be used.
These features of iterator blocks make them completely different from other methods - it can be quite tricky to get your head round how they work, until it suddenly clicks. It can be quite useful to debug into an iterator block to really see how the flow works for yourself. Now we know about iterators and how they can be easily implemented in C# 2, let's think about how we can chain them together.
Data pipelines and composability
    LINQ to Objects is built on the concept of a data pipeline. We start with a data source which implements
    IEnumerable, and chain together lots of operations, each one acting on the result of the
    previous one and returning another IEnumerable - although it could be one with a different type of
    result. This act of chaining small operations together to form one large operation is called composition.
    The emphasis in composition is of making each of the small operations genuinely simple. Flexibility is provided in
    LINQ to Objects by the use of delegates which can act on the input elements - for example, to act as a predicate
    in a filter, indicating whether or not a particular value should pass the filter; or as a projection to map one
    value to another.
    As a silly demonstration of this, suppose we have a sequence of numbers, in some random order. We want a result which
    is a sequence of strings - we want to filter out any negative numbers, sort the rest, and then call ToString
    to get the numbers in hexadecimal format. In C# 3 and .NET 3.5, we can do this with the following query expression:
from number in numbers
where number >= 0
orderby number
select number.ToString("x")
The parser translates this into a non-query expression before the compiler deals with it any further:
numbers.Where(number => number >= 0)
       .OrderBy(number => number)
       .Select(number => number.ToString("x"))
    The => syntax is for lambda expressions - a simple way of creating delegates (and also expression trees).
    The Where, OrderBy and Select methods are extension methods provided by LINQ to
    Objects to do filtering, ordering and projections respectively. As you can tell, there are lots of different features interacting
    here, and I don't intend to go into the details of most of them - we're just concentrating on the behaviour of iterators in this article.
Deferred execution vs immediate execution
    Just as when we created our own iterator using iterator blocks, the expression above doesn't execute any significant
    code when it's executed - it sets up the pipeline, but won't actually ask for any data. It only starts doing any
    actual filtering/ordering/projecting when the return value of Select is first asked for some data.
    This is called deferred execution. Many LINQ to Objects query operators use deferred execution, especially when they're
    conceptually building more of a pipeline. Other query operators such as Count(), Max() and
    ToList() use immediate execution - which means they perform their work immediately, and give you a
    result with all the appropriate data.
    It's important to be aware of what's going on here so you know when operations
    will actually take place. For instance, if you specify a projection which might throw the exception, you need to know that
    the exception will only occur when the data is being requested: just because the call to Select doesn't throw an
    exception doesn't mean everything is okay!
Streaming vs buffering
    There's another concept which is similar to deferred/immediate execution, but subtly different - and that's whether an operator
    streams the data or buffers it. An operator which streams data basically only asks for as much data (from the previous
    part of the pipeline) as it it needs to give the next result. Select() is the simplest example of this - it retrieves
    the next item, applies the specified delegate to project it to the result, and then yields that result. Where()
    is slightly more complicated - it will keep asking for more data until either there is no more data, or until it finds
    an element which passes its filter. When it finds a passing element, however, it will yield that value without asking for any
    more data.
    Buffering operators act differently. When they're asked for data, they buffer up all the data from the earlier piece
    of the pipeline, and then work out all the results. Reverse() is the simplest example of this - in order to
    return all of the data in the reverse order, you've got to get to the last element before you can yield anything.
    Sorting and grouping also buffer data in LINQ to Objects.
Generators
    LINQ to Objects contains a few generator operators - ones which produce a sequence of data without starting
    with one. DefaultIfEmpty is listed as a generator, but in some ways it isn't - it acts on an existing sequence,
    albeit one which may be empty. We won't look at that here.
    There are three other standard generator operators: Empty, Repeat and Range.
    We'll look at possible implementations of each of them, and then work out how we could implement all of them slightly
    differently using composition. Of course, in reality all of these are implemented for us already, but it gives nice
    practice for understanding iterator blocks and composition.
Empty
    Empty does exactly what you'd expect it to - it returns an empty sequence, i.e. one where the first call
    to MoveNext() returns false. Strangely enough, implementing it isn't quite as straightforward
    as we might expect. We don't need to return any values, so we can just leave the method body empty, right? Well, no... we actually
    need code like this:
public static IEnumerable<TResult> Empty<TResult> ()
{
    yield break;
}
    The yield break; statement is an "early out" in iterator blocks - it acts like the end of the method, making
    MoveNext() return false. Why do we need it at all? Well, with an empty method body there's nothing
    to tell the compiler that we want an iterator block in the first place! The compiler only treats a method as being implemented
    with an iterator block if it encounters yield return or yield break within the method body.
    This is the only case I can think of where a yield break directly before the end of a method actually has
    an effect.
Repeat
    Repeat takes two parameters - an element to return, and the number of times to return it. It will
    yield the same element however many times you've specified, and then stop. It's trivial to implement (including
    checking that the count isn't negative):
public static IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
{
    if (count < 0)
    {
        throw new ArgumentOutOfRangeException("count");
    }
    for (int i=0; i < count; i++)
    {
        yield return element;
    }
}
    A note on error checking: there's a problem here. Ideally, we would like our Repeat method to throw
    an exception immediately we call it, if count is less than 0. Currently, we'll only throw the exception
    when some code tries to iterate through the result. That's generally a bad thing, and it would be nice
    if the language
    gave us more support here . For the moment though, we'll leave our slightly broken implementation as it is.
Range
    Range is also straightforward. Unlike the previous two operators, it's not generic: it always returns an
    IEnumerable<int>. You specify two parameters: the starting point, and how many values to return. The returned
    sequence starts at the given value, and counts up - so Range(10, 3) will yield 10, 11, 12. Here's an implementation,
    omitting error checking:
public static IEnumerable<int> Range(int start, int count)
{
    for (int i=start; i < start+count; i++)
    {
        yield return i;
    }
}
One generator to rule them all
    The thing about the above generators is that they're all a bit specialized. In particular, they take no delegates, unlike
    almost everything else in LINQ to Objects! Let's introduce a new generator. It's so general, I reckon it deserves
    the name Generate. It will produce an infinite sequence by taking a starting point, yielding that, and then
    applying a delegate to the current value, yielding the current value, and repeating ad infinitum. Here's the code (with
    no error checking - we'd normally check that step wasn't null):
public static IEnumerable<TSource> Generate<TSource>(TSource start,
                                                     Func<TSource,TSource> step)
{
    TSource current = start;
    while (true)
    {
        yield return current;
        current = step(current);
    }
}
    The code is as simple as we'd hope - but of course, we'd never want to actually iterate over the result of Generate
    directly - it would keep going forever. Indeed, Generate is almost useless without composition. However, by the
    cunning use of just a single other operator, we can easily produce the three generators we've already looked at. Rather than
    show any infinite sequences here, I'll wait for a moment before actually using Generate.
Take
    Take is what I think of as a "pipeline" operator - it takes a sequence and returns a sequence, just like
    Select and Where do. As well as the "source" sequence, you need to specify how many elements of
    the source you want to be returned. Take simply yields elements from the source until either the source
    runs out of data, or the requested number of elements have been returned. So, the implementation (minus error checking) could
    look like this:
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source,
                                                 int count)
{
    foreach (TSource element in source)
    {
        if (count==0)
        {
            yield break;
        }
        count--;
        yield return element;
    }
}
    Note the use of yield break here to indicate that we've finished when the counter reaches 0.
    (We could have used a simple break in this particular case, as there's no more code after the loop,
    but I thought it would be worth demonstrating a slightly more normal use of yield break than the one
    in our Empty implementation.) The use of this in the declaration of the source
    parameter indicates that Take is an extension method.
    With Take and Generate in our toolkit, we can now reimplement Empty, Repeat
    and Range using composition:
public static IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
{
    return Generate(element, x => x).Take(count);
}
    
public static IEnumerable<TResult> Empty<TResult>()
{
    return Repeat(default(TResult), 0);
}
    
public static IEnumerable<int> Range(int start, int count)
{
    return Generate(start, x => x+1).Take(count);
}
    Repeat is just implemented by first creating an infinite sequence which begins with the specified element
    and then applies an identity function to it at each step. We then cut that infinite sequence to the
    required length by composing it with Take. An Empty sequence is simply the default value
    of the required type, repeated zero times. Finally Range creates another infinite sequence - starting
    with the specified value, it just adds one repeatedly - and again we Take the required number of elements.
    It's not particularly useful to actually implement the three generator functions in this manner - but it
    does give an interesting insight into composition. The fact that Generate always creates an infinite
    sequence forces you to think about the fact that the values are only generated as they're requested - if Generate
    were to try to create all the values first, and then return them in one go, you'd clearly run out of memory quite quickly.
    I have a somewhat more complicated example, taken from a blog post
    which contains the full source. It's a query expression which is used to generate a visualisation of the Mandelbrot set:
var query = from row in Enumerable.Range(0, ImageHeight)
            from col in Enumerable.Range(0, ImageWidth)
            // Work out the initial complex value from the row and column
            let c = new Complex((col * SampleWidth) / ImageWidth + OffsetX,
                                (row * SampleHeight) / ImageHeight + OffsetY)
            // Work out the number of iterations
            select Generate(c, x => x * x + c).TakeWhile(x => x.SquareLength < 4)
                                              .Take(MaxIterations)
                                              .Count() into count
            // Map that to an appropriate byte value
            select (byte)(count == MaxIterations ? 0 : (count % 255) + 1);
    The use of Generate here is within the nested query used to find the colour to use for a particular
    point. We generate an infinite path of values which jump around the field of complex numbers, stopping
    when either the point has a modulus of 2 or more, or we've performed as many iterations as we're interested in.
    The full code is downloadable from the blog post, but I hope this gives some indication of how powerful the
    composition technique is.
Conclusion
LINQ to Objects is full of sequences, and a bunch of simple operators which combine to form myriad possibilities. In C# in Depth, I recommend that when you're faced with a complicated query expression, you look at the sequences from which it's composed. Understanding how iterators work, streaming/buffering and deferred/immediate execution gives important foundation material to being able to work with LINQ effectively. Composing queries is fun, and the iterator blocks which first appeared in C# 2 make iterators reasonably easy to work with. In this article I've given just a few examples, but I'm sure you can come up with your own. Go ahead, create your own operators - and above all, experiment.
For many of us, LINQ isn't just a new technology, but a new way of approaching problems when writing in C#. It always takes a while to really get to grips with fresh ideas, so I'd heartily recommend playing with LINQ before you need to use it professionally. While the other LINQ providers are perhaps "sexier" than LINQ to Objects, there's a purity, simplicity and immediacy involved when you can just define a collection in memory and manipulate it to your heart's content.
Likewise iterator blocks are a mystery to many developers. Unless you've already been thinking in sequences, you may well have never written an iterator block. Hopefully this article has shown you how simple they can be, along with a couple of warnings where you may have expected slightly different behaviour. Iterator blocks were already useful in C# 2, but when combined with LINQ they can be truly spectacular, all within tiny pieces of code. Again - experiment. It'll all be useful in the long run, and if you're anything like me, I promise you'll have a lot of fun along the way.
