Ranges and corner cases
The first edition of C# in Depth contained an abstract generic Range
class, using a virtual method to step between items in the range. Unfortunately,
there are some corner cases that end up being quite awkward. This article isn't
so much about designing the perfect range class, as the considerations to take into
account. My MiscUtil library
contains a class which handles quite a lot of this, although no doubt there's room
for improvement. I blogged
about this a while ago, but I think it's a topic which is worth revisiting.
Broad Scope
First let's define what we want to be able to do, in fairly vague terms:
- Create a representation of range of values for any suitable type
- Iterate over those values using some custom stepping function
So, not a lot of things in scope... although the devil lies in the detail. Let's explicitly take some things out of scope. I'm not going to:
- Consider the equality of two ranges
- Consider the intersection, union etc of two ranges
- Consider the range as a set of discreet values (e.g. "the even integers between 0 and 100")
- Deal with stepping functions which hop around within the range (i.e. it should always step up, or always step down) - we'll see why later
Given that very broad delineation, we can probably start thinking about some details.
Details
First, let's think about the types that are applicable. We want it to be generic,
and we want to be able to compare any two values. We could do that by constraining
T
to implement IComparable<T>
- but that would prevent
us from using types which may have been designed before generics became available,
and which haven't been updated with the new interface. Also, simply using the
default comparison might be a bit limiting later on... in particular, I already
have in my mind that one way of reversing a range is simply to swap its minimum
and maximum values, and reversing the sense of the comparison at the same time.
To be honest, I can't think of many times where it would actually make sense
to create a range of items which didn't have a normal comparison - but I'm
pretty convinced that we do want to be able to use custom comparisons.
Now we have another choice: do we express that comparison in terms of IComparer<T>
,
or Comparison<T>
? They're equivalent - but IComparer<T>
is probably more common, so we'll use that. It also means we can use Comparer<T>.Default
for the natural comparison without any further messing around. If callers want to
use a Comparison<T>
, it's easy to build an adapter class (and
indeed MiscUtil contains one).
Going back to the constraint decision, this is an interesting design choice: do we potentially constrain callers too much, frustrating them if they have a really good reason to create a range of a non-naturally-comparable type, or do we risk the type being used inappropriately by not constraining it? On balance, while I like to push API users towards appropriate decisions, I think I prefer to work without the constraint here. The penalty for those odd cases would just be too high.
So, we know we'll have a comparison - and it's pretty likely that we'll have upper and lower bounds. But that raises even more questions:
- Should the bounds be inclusive or exclusive (i.e. should the range be closed or open)?
- Do we always need both bounds, or could we have a range which has no lower bound or no upper bound? (Or even neither, representing the range covered by all values of the type.)
- What does it mean to iterate over a range with an open lower bound?
- What does it mean to iterate over a range with no lower bound?
As I want to keep this reasonably simple, I'm going to stick to finite ranges for the moment
- i.e. we will always have both bounds. We'll default to a closed lower bound (inclusive)
and an open upper bound (exclusive), but have optional constructor parameters to allow other possibilities. Likewise
I'll default to the default comparer for the type (if you see what I mean) but allow custom
comparisons to be specified. In this case I'm going to use a default value of null
for the comparer - which admittedly could hide a bug if someone accidentally passes in
a null reference, but is otherwise quite neat. (As a little bit of serendipity, it means that
if you want to explicitly use the default comparer, you can pass in
default(IComparer<T>)
- which is null, of course.
We'll come onto the "iteration from an open bound" issue later on.
One final design point before we get cracking: we're going to make the type immutable. Of course, it the element type is mutable there's not a lot we can do about it, but most sensible range element types are naturally immutable anyway.
First implementation steps
It's time to write some actual code. We'll start off with very little: just construction and a containment test.
public sealed class Range<T>
{
public T LowerBound { get; private set; }
public T UpperBound { get; private set; }
public bool InclusiveLowerBound { get; private set; }
public bool InclusiveUpperBound { get; private set; }
public IComparer<T> Comparer { get; private set; }
public Range(T lowerBound, T upperBound,
bool inclusiveLowerBound = true,
bool inclusiveUpperBound = false,
IComparer<T> comparer = null)
{
LowerBound = lowerBound;
UpperBound = upperBound;
InclusiveLowerBound = inclusiveLowerBound;
InclusiveUpperBound = inclusiveUpperBound;
Comparer = comparer ?? Comparer<T>.Default;
if (Comparer.Compare(LowerBound, UpperBound) > 0)
{
throw new ArgumentException("Invalid bounds");
}
}
public bool Contains(T item)
{
int lowerCompare = Comparer.Compare(LowerBound, item);
if (lowerCompare > (InclusiveLowerBound ? 0 : -1))
{
return false;
}
int upperCompare = Comparer.Compare(item, UpperBound);
if (upperCompare > (InclusiveUpperBound ? 0 : -1))
{
return false;
}
return true;
}
}
The fact that we've got some duplicate work for both bounds makes me wonder whether we should
actually encapsulate the idea of a bound within its own type... but for the moment we won't.
(If we were including the ability to create "infinite" lower/upper bounds then it would probably
be worthwhile.) The Contains
code is far from pretty, but the unit tests are green
so I'm reasonably happy that they work.
In the interests of brevity I've used automatic properties with private setters to achieve the immutability we wanted. That means the type is mutable from within its own code - it's not ideal, but declaring a readonly field and then a property which returns it is slightly distracting when we're trying to concentrate on the overall design.
It would be trivial to add methods such as WithExclusiveLowerBound
, WithInclusiveLowerBound
,
WithLowerBound
and so on, which would create new instances of Range
to return. Likewise we could create a static, nongeneric class with generic factory methods to
allow the type arguments to be inferred from the regular arguments, like this:
var range = Range.Of(5, 10);
All of these are worthy features, but they're not terribly interesting, so I won't go into them any further here. What's rather more interesting is how we iterate over a range.
Iterating
Let's get back to the design questions involved. I think it's reasonably clear that we need to have some sort of stepping function to get from an existing value to the next one. We're then left with two questions: what do we use as the first value, and when do we stop?
Where do we start?
The first question is easy to answer for the case where the lower bound is inclusive: we simply
start with the lower bound. When the lower bound is exclusive it's trickier to define. Suppose
we were using a range (0, 50) (exclusive at both ends) and stepping by 5 each time.
Should the first value be 1 (the lowest value in the range) or 5 (the result of starting at 0
and skipping the first value)? Finding a minimal value which is included in the range
is hard in many cases and impossible in others, particularly when there's a custom comparer. For
example, in my unit tests I have a custom IComparer<int>
which just compares
the last two digits of any integers it's given. So you can have a range (41, 37) which would
"contain" 23, 12 and 44 but not 38. What would be the minimal value in that range?
We could require that the caller provides an initial value, but that feels ugly to me. I think it's simpler to just skip the first element - so these two iterables would be equivalent:
var range = new Range<int>(0, 50, false, false);
var simpleIterable = range.StepBy(x => x + 5);
var equivalentIterable = range.WithInclusiveLowerBound()
.StepBy(x => x + 5)
.Skip(1);
That's inadvertently given away another design point: we're going to have a method which
returns an IEnumerable<T>
. We could return an
IEnumerator<T>
just as easily (the implementation is going to use
iterator blocks, which can be used for either) but life is much simpler if we return
an IEnumerable<T>
: we can use foreach
with it, we can
use it in LINQ queries and so on.
On to the second question...
When do we stop?
Again, there are multiple options here. The most important point is to rule out a
broken approach first: we shouldn't just keep iterating until we're provided
with a value which isn't in the range. That would make it impossible to iterate
over a Range<byte>
from 0 to 255 inclusive, for example - which is a
perfectly reasonable range to iterate over. These options present themselves:
- Take two stepping functions, one to ask whether there is another value, and one to find out what the value is.
- Use one stepping function, but make it return both of these bits of information in one
go. That leads to some suboptions:
- Use a
Func<Tuple<bool, T>>
to return both values together - Use a delegate with an
out
parameter (urgh) - Try to use
null
as a "no more values" result; this wouldn't be too hard if we constrainedT
to be a value type, but it's not a nice solution
- Use a
- Stop either when the next value is outside the range or when it's equal or less than the previous value, thus preventing wrap-around
I think the last option is the cleanest one - and it also requires the least work from callers who only want to give a simple "add n to the previous value" sort of stepping function. It's slightly less flexible, in that we can't have a stepping function which dots around within the range, but it covers the most common case more simply. It's easy to implement and easy to use.
Implementation
Now we know what we're going to do, the implementation is very simple. Unlike the MiscUtil implementation, we're simply going to add a method using an iterator block to do the hard work - there's no real need for a separate public type in this case. For the purposes of this article, I'm going to require that the caller provides the stepping function directly, too. In MiscUtil I've used Marc Gravell's cunning generic operator support to allow a simple "step by X" call to be used for any types which support the appropriate + operator, but that's a little beyond what I'm aiming for here.
Here's an initial implementation:
public IEnumerable<T> StepBy(Func<T, T> step)
{
T current = LowerBound;
// Skip the first value for exclusive lower bounds
if (!InclusiveLowerBound)
{
current = step(current);
}
while (Contains(current))
{
yield return current;
T next = step(current);
// Handle a stepping function which wraps
// round from a value near the end to one
// near the start; or a stepping function
// which does nothing.
if (Comparer.Compare(next, current) <= 0)
{
yield break;
}
current = next;
}
}
One potential source of concern is that we're only performing the wrap-handling check inside the loop, even though we use the step function before entering the loop if the lower bound is exclusive. Fortunately, this drops out naturally: if the step function gives a first value which is less than the lower bound, that value won't be within the range anyway, so it won't match the loop condition.
An alternative way of writing the starting point is to use the conditional operator:
T current = InclusiveLowerBound ? LowerBound : step(LowerBound);
Whether you find this more or less readable is a matter of personal preference.
One final point remains: we always appear to be stepping up the range. What if we want to count down?
Reversing a range
Reversing a range is relatively straightforward, but we do need to make sure we reverse everything about it. We need to swap the upper and lower bounds, along with their inclusivity - and also swap the comparer. Fortunately, writing a reversing comparer is really easy - so long as you don't fall into a nasty trap. Here's the code for a comparer which simply reverses an existing one:
public sealed class ReverseComparer<T> : IComparer<T>
{
private readonly IComparer<T> originalComparer;
public ReverseComparer(IComparer<T> originalComparer)
{
this.originalComparer = originalComparer;
}
public int Compare(T x, T y)
{
return originalComparer.Compare(y, x);
}
}
So what's the trap? Well, in the Compare
method, it's quite tempting to
just negate the results of the original comparison... but that fails if the original
result is int.MinValue
: due to the way 2's-complement works, negating
int.MinValue
gives you a result of int.MinValue
:( There are
alternative approaches - you could take the sign of the original result, and negate
that, for example - but just switching the operands works very neatly.
Now that we've got a reverse comparer, we can easily implement a Reverse
method on Range
:
public Range<T> Reverse()
{
return new Range<T>(
lowerBound: UpperBound,
upperBound: LowerBound,
inclusiveLowerBound: InclusiveUpperBound,
inclusiveUpperBound: InclusiveLowerBound,
comparer:new ReverseComparer<T>(Comparer));
}
There are further optimizations available, such as detecting a reverse comparer to avoid reversing the same comparer twice (you can just use the original instead, and avoid two levels of indirection) but I've omitted them for the sake of simplicity. Likewise for clarity I've used C# 4's named arguments feature - but of course it's not really required.
One point to note here: in some cases the result of reversing the sequence generated by the original range isn't the same as the result of reversing the range and then using an inverse stepping function. For example, consider the range [0, 5] (i.e. the integers zero to five inclusive), with a stepping function of adding 2. That yields { 0, 2, 4 } - but when you reverse the range and change to a stepping function of subtracting 2, the sequence is { 5, 3, 1 }. The simplest way of reversing the original sequence is to use LINQ's built-in method, of course.
Conclusion
We haven't written much code here, but we've ended up with a useful little class. It was important
to work out the scope of the class to start with - different decisions early on would have changed
the implementation completely - but it was equally important to think of corner cases such as
wrapping, stepping functions which don't change the original value, and comparisons which return
int.MinValue
. All the code here is available for download,
including some brief unit tests to sanity-check it.