Ranges and corner cases

The first edition of C# in Depth contained an abstract generic Range class, using a virtual method to step between items in the range. Unfortunately, there are some corner cases that end up being quite awkward. This article isn't so much about designing the perfect range class, as the considerations to take into account. My MiscUtil library contains a class which handles quite a lot of this, although no doubt there's room for improvement. I blogged about this a while ago, but I think it's a topic which is worth revisiting.

Broad Scope

First let's define what we want to be able to do, in fairly vague terms:

Create a representation of range of values for any suitable type
Iterate over those values using some custom stepping function

So, not a lot of things in scope... although the devil lies in the detail. Let's explicitly take some things out of scope. I'm not going to:

Consider the equality of two ranges
Consider the intersection, union etc of two ranges
Consider the range as a set of discreet values (e.g. "the even integers between 0 and 100")
Deal with stepping functions which hop around within the range (i.e. it should always step up, or always step down) - we'll see why later

Given that very broad delineation, we can probably start thinking about some details.

Details

First, let's think about the types that are applicable. We want it to be generic, and we want to be able to compare any two values. We could do that by constraining T to implement IComparable<T> - but that would prevent us from using types which may have been designed before generics became available, and which haven't been updated with the new interface. Also, simply using the default comparison might be a bit limiting later on... in particular, I already have in my mind that one way of reversing a range is simply to swap its minimum and maximum values, and reversing the sense of the comparison at the same time.

To be honest, I can't think of many times where it would actually make sense to create a range of items which didn't have a normal comparison - but I'm pretty convinced that we do want to be able to use custom comparisons. Now we have another choice: do we express that comparison in terms of IComparer<T>, or Comparison<T>? They're equivalent - but IComparer<T> is probably more common, so we'll use that. It also means we can use Comparer<T>.Default for the natural comparison without any further messing around. If callers want to use a Comparison<T>, it's easy to build an adapter class (and indeed MiscUtil contains one).

Going back to the constraint decision, this is an interesting design choice: do we potentially constrain callers too much, frustrating them if they have a really good reason to create a range of a non-naturally-comparable type, or do we risk the type being used inappropriately by not constraining it? On balance, while I like to push API users towards appropriate decisions, I think I prefer to work without the constraint here. The penalty for those odd cases would just be too high.

So, we know we'll have a comparison - and it's pretty likely that we'll have upper and lower bounds. But that raises even more questions:

Should the bounds be inclusive or exclusive (i.e. should the range be closed or open)?
Do we always need both bounds, or could we have a range which has no lower bound or no upper bound? (Or even neither, representing the range covered by all values of the type.)
What does it mean to iterate over a range with an open lower bound?
What does it mean to iterate over a range with no lower bound?

As I want to keep this reasonably simple, I'm going to stick to finite ranges for the moment - i.e. we will always have both bounds. We'll default to a closed lower bound (inclusive) and an open upper bound (exclusive), but have optional constructor parameters to allow other possibilities. Likewise I'll default to the default comparer for the type (if you see what I mean) but allow custom comparisons to be specified. In this case I'm going to use a default value of null for the comparer - which admittedly could hide a bug if someone accidentally passes in a null reference, but is otherwise quite neat. (As a little bit of serendipity, it means that if you want to explicitly use the default comparer, you can pass in default(IComparer<T>) - which is null, of course. We'll come onto the "iteration from an open bound" issue later on.

One final design point before we get cracking: we're going to make the type immutable. Of course, it the element type is mutable there's not a lot we can do about it, but most sensible range element types are naturally immutable anyway.

First implementation steps

It's time to write some actual code. We'll start off with very little: just construction and a containment test.

public sealed class Range<T>
{
    public T LowerBound { get; private set; }
    public T UpperBound { get; private set; }
    public bool InclusiveLowerBound { get; private set; }
    public bool InclusiveUpperBound { get; private set; }
    public IComparer<T> Comparer { get; private set; }

    public Range(T lowerBound, T upperBound,
                  bool inclusiveLowerBound = true,
                  bool inclusiveUpperBound = false,
                  IComparer<T> comparer = null)
    {
        LowerBound = lowerBound;
        UpperBound = upperBound;
        InclusiveLowerBound = inclusiveLowerBound;
        InclusiveUpperBound = inclusiveUpperBound;
        Comparer = comparer ?? Comparer<T>.Default;

        if (Comparer.Compare(LowerBound, UpperBound) > 0)
        {
            throw new ArgumentException("Invalid bounds");
        }
    }

    public bool Contains(T item)
    {
        int lowerCompare = Comparer.Compare(LowerBound, item);
        if (lowerCompare > (InclusiveLowerBound ? 0 : -1))
        {
            return false;
        }
        int upperCompare = Comparer.Compare(item, UpperBound);
        if (upperCompare > (InclusiveUpperBound ? 0 : -1))
        {
            return false;
        }
        return true;
    }
}

The fact that we've got some duplicate work for both bounds makes me wonder whether we should actually encapsulate the idea of a bound within its own type... but for the moment we won't. (If we were including the ability to create "infinite" lower/upper bounds then it would probably be worthwhile.) The Contains code is far from pretty, but the unit tests are green so I'm reasonably happy that they work.

In the interests of brevity I've used automatic properties with private setters to achieve the immutability we wanted. That means the type is mutable from within its own code - it's not ideal, but declaring a readonly field and then a property which returns it is slightly distracting when we're trying to concentrate on the overall design.

It would be trivial to add methods such as WithExclusiveLowerBound, WithInclusiveLowerBound, WithLowerBound and so on, which would create new instances of Range to return. Likewise we could create a static, nongeneric class with generic factory methods to allow the type arguments to be inferred from the regular arguments, like this:

var range = Range.Of(5, 10);

All of these are worthy features, but they're not terribly interesting, so I won't go into them any further here. What's rather more interesting is how we iterate over a range.

Iterating

Let's get back to the design questions involved. I think it's reasonably clear that we need to have some sort of stepping function to get from an existing value to the next one. We're then left with two questions: what do we use as the first value, and when do we stop?

Where do we start?

The first question is easy to answer for the case where the lower bound is inclusive: we simply start with the lower bound. When the lower bound is exclusive it's trickier to define. Suppose we were using a range (0, 50) (exclusive at both ends) and stepping by 5 each time. Should the first value be 1 (the lowest value in the range) or 5 (the result of starting at 0 and skipping the first value)? Finding a minimal value which is included in the range is hard in many cases and impossible in others, particularly when there's a custom comparer. For example, in my unit tests I have a custom IComparer<int> which just compares the last two digits of any integers it's given. So you can have a range (41, 37) which would "contain" 23, 12 and 44 but not 38. What would be the minimal value in that range?

We could require that the caller provides an initial value, but that feels ugly to me. I think it's simpler to just skip the first element - so these two iterables would be equivalent:

var range = new Range<int>(0, 50, false, false);
var simpleIterable = range.StepBy(x => x + 5);
var equivalentIterable = range.WithInclusiveLowerBound()
                              .StepBy(x => x + 5)
                              .Skip(1);

That's inadvertently given away another design point: we're going to have a method which returns an IEnumerable<T>. We could return an IEnumerator<T> just as easily (the implementation is going to use iterator blocks, which can be used for either) but life is much simpler if we return an IEnumerable<T>: we can use foreach with it, we can use it in LINQ queries and so on.

On to the second question...

When do we stop?

Again, there are multiple options here. The most important point is to rule out a broken approach first: we shouldn't just keep iterating until we're provided with a value which isn't in the range. That would make it impossible to iterate over a Range<byte> from 0 to 255 inclusive, for example - which is a perfectly reasonable range to iterate over. These options present themselves:

Take two stepping functions, one to ask whether there is another value, and one to find out what the value is.
Use one stepping function, but make it return both of these bits of information in one go. That leads to some suboptions:
- Use a Func<Tuple<bool, T>> to return both values together
- Use a delegate with an out parameter (urgh)
- Try to use null as a "no more values" result; this wouldn't be too hard if we constrained T to be a value type, but it's not a nice solution
Stop either when the next value is outside the range or when it's equal or less than the previous value, thus preventing wrap-around

I think the last option is the cleanest one - and it also requires the least work from callers who only want to give a simple "add n to the previous value" sort of stepping function. It's slightly less flexible, in that we can't have a stepping function which dots around within the range, but it covers the most common case more simply. It's easy to implement and easy to use.

Implementation

Now we know what we're going to do, the implementation is very simple. Unlike the MiscUtil implementation, we're simply going to add a method using an iterator block to do the hard work - there's no real need for a separate public type in this case. For the purposes of this article, I'm going to require that the caller provides the stepping function directly, too. In MiscUtil I've used Marc Gravell's cunning generic operator support to allow a simple "step by X" call to be used for any types which support the appropriate + operator, but that's a little beyond what I'm aiming for here.

Here's an initial implementation:

public IEnumerable<T> StepBy(Func<T, T> step)
{
    T current = LowerBound;
    // Skip the first value for exclusive lower bounds
    if (!InclusiveLowerBound)
    {
        current = step(current);
    }
    while (Contains(current))
    {
        yield return current;
        T next = step(current);
        // Handle a stepping function which wraps
        // round from a value near the end to one
        // near the start; or a stepping function
        // which does nothing.
        if (Comparer.Compare(next, current) <= 0)
        {
            yield break;
        }
        current = next;
    }
}

One potential source of concern is that we're only performing the wrap-handling check inside the loop, even though we use the step function before entering the loop if the lower bound is exclusive. Fortunately, this drops out naturally: if the step function gives a first value which is less than the lower bound, that value won't be within the range anyway, so it won't match the loop condition.

An alternative way of writing the starting point is to use the conditional operator:

T current = InclusiveLowerBound ? LowerBound : step(LowerBound);

Whether you find this more or less readable is a matter of personal preference.

One final point remains: we always appear to be stepping up the range. What if we want to count down?

Reversing a range

Reversing a range is relatively straightforward, but we do need to make sure we reverse everything about it. We need to swap the upper and lower bounds, along with their inclusivity - and also swap the comparer. Fortunately, writing a reversing comparer is really easy - so long as you don't fall into a nasty trap. Here's the code for a comparer which simply reverses an existing one:

public sealed class ReverseComparer<T> : IComparer<T>
{
    private readonly IComparer<T> originalComparer;

    public ReverseComparer(IComparer<T> originalComparer)
    {
        this.originalComparer = originalComparer;
    }

    public int Compare(T x, T y)
    {
        return originalComparer.Compare(y, x);
    }
}

So what's the trap? Well, in the Compare method, it's quite tempting to just negate the results of the original comparison... but that fails if the original result is int.MinValue: due to the way 2's-complement works, negating int.MinValue gives you a result of int.MinValue :( There are alternative approaches - you could take the sign of the original result, and negate that, for example - but just switching the operands works very neatly.

Now that we've got a reverse comparer, we can easily implement a Reverse method on Range:

public Range<T> Reverse()
{
    return new Range<T>(
        lowerBound: UpperBound,
        upperBound: LowerBound,
        inclusiveLowerBound: InclusiveUpperBound,
        inclusiveUpperBound: InclusiveLowerBound,
        comparer:new ReverseComparer<T>(Comparer));
}

There are further optimizations available, such as detecting a reverse comparer to avoid reversing the same comparer twice (you can just use the original instead, and avoid two levels of indirection) but I've omitted them for the sake of simplicity. Likewise for clarity I've used C# 4's named arguments feature - but of course it's not really required.

One point to note here: in some cases the result of reversing the sequence generated by the original range isn't the same as the result of reversing the range and then using an inverse stepping function. For example, consider the range [0, 5] (i.e. the integers zero to five inclusive), with a stepping function of adding 2. That yields { 0, 2, 4 } - but when you reverse the range and change to a stepping function of subtracting 2, the sequence is { 5, 3, 1 }. The simplest way of reversing the original sequence is to use LINQ's built-in method, of course.

Conclusion

We haven't written much code here, but we've ended up with a useful little class. It was important to work out the scope of the class to start with - different decisions early on would have changed the implementation completely - but it was equally important to think of corner cases such as wrapping, stepping functions which don't change the original value, and comparisons which return int.MinValue. All the code here is available for download, including some brief unit tests to sanity-check it.