XOR for hashcodes
Chapter 3: Parameterized typing with generics: 3.3.3
Last updated: 1/29/2008
The hashcode algorithm used in the
Pair class is based on repeatedly multiplying and adding. This is my "default implementation" when I need to create a hashcode. Another common implementation is XORing the hashcodes of the constituent fields - an algorithm I denounce in the book, but without details.
It's not uncommon for two or more fields to be of the same type - in the case of our sample type,
Pair<string,string> for example. Let's take the latter type, and consider the following pairs:
- "red", "red"
- "blue", "blue"
- "green", "green"
- "red", "green"
- "green", "red"
Again, this is far from uncommon - often the same values crop up together. We'd like all of the above pairs to have different hashcodes, to avoid hash collisions - but there are only three values in the above list if we use XORing. The first three would all be 0, and the last two would be the same as each other. A pair of the form (x, x) will always have a hash of 0, and a pair of the form (x, y) will always have the same hash as (y, x).
In some cases you may know details of the data you're hashing, and be able to use XOR knowing that the above cases won't come up - but for a general purpose algorithm (particularly when dealing with generic types) it's a bad idea.
An early implementation of anonymous types used XOR hashing. Fortunately this was fixed long before release!