24 May 2007

One thing that has always frustrated me about typical binary search routines found in libraries is they always assume you can easily produce a copy of the item you are searching for. What I mean is they typically follow this pattern,

int BinarySearch<T>(IList<T> list, T item);

which takes a list of some type, left open here, and searches for it in the list. This is great if the very next thing you do next is insert item into the list. You already have item in hand so this signature is perfect. But if you don't have a ready item, and item is difficult to produce, you are stuck.

This happened to me just recently in my code. I had a sorted list of a data structure representing a span of source, similar to a TextPointer from WPF, called a TextRange. My version can quickly return the offset of the beginning of the span but, since TextRange tracks source changes, creating a new TextRange is expensive. When text in the editor is change, I receive the location of the change in an event. How do I figure out which one of my TextRanges was modified? Since the list is sorted, the obvious choice is to use a binary search which yields O(Log N) performance. Since my TextRanges are store in a List<TextRange>, obviously I should call List<T>.BinarySearch(). Unfortunately, TextRanges are expensive to produce; too expensive for a per-keystroke operation (I am glossing over some details for the sake of brevity; you need to trust me this is true). I have one of two choices, write my own BinarySearch() or create a fake TextRange that is a special TextRange that is less expensive to create and only used for searching. Neither choice is pleasant. Hand-rolled binary searches are not hard to right but, for some strange reason, I always get them wrong the first time.

What I needed is prebuilt and tested version of BinarySearch() that looks more like,

int BinarySearch<T, K>(IList<T> list, K key, Converter<T, K> converter);

What this binary search routine does is find some key value in the list given a function that can convert any list item into the key. In my example, I would call it like,

int location = BinarySearch(textRangeList, location, TextRangeToLocation);

where TextRangeToLocation looks something like,

int TextRangeToLocation(TextRange range) { return range.Location; }

I don't need to produce a TextRange. I have the location I am looking for handy, I just need to know what, if any, TextRange contains the location. This method would do just that. You can even use a lambda function in C# 3.0 to get this all on one line,

int location = BinarySearch(textRangeList, location, (n) => n.Location);

This is starting to look like a good idea. Let's code it up. Here is the method I ended up with,

public static int BinarySearch<T, K>(IList<T> list, K value,
    Converter<T, K> convert, Comparison<K> compare) {
    int i = 0;
    int j = list.Count - 1;
    while (i <= j) {
        int m = i + (j - i) / 2;
        int c = compare(convert(list[m]), value);
        if (c == 0)
            return m;
        if (c < 0)
            i = m + 1;
        else
            j = m - 1;
    }
    return ~i;
}

I added an extra parameter, compare. This is required because I didn't put any limits on what K can be and I need a way to compare one K with another. Many types, such as int, already know how to compare values of their own type. Typically types that are comparable, such as int, string, etc., implement IComparable<T>. We can accommodate such types by creating an overloaded version of BinarySearch() that looks like,

public static int BinarySearch<T, K>(IList<T> list, K value,
    Converter<T, K> convert) where K : IComparable<K> {
    return list.BinarySearch(value, convert, (n, m) => n.CompareTo(m));
}

which uses a lambda function to create a Comparison<T> for any type that supports IComparable<T>. Beware that this doesn't work well with reference types because it doesn't check for null. You can make it safer by putting struct in the where clause but that seemed overly restrictive to me.

Now that I have a BinarySearch() that meets my needs better than the built-in BinarySearch(), my next thought was I should evangelize the benefits my version of BinarySearch() to the team responsible for maintaining the CLR collections and, maybe, someday, I will have it as a native part of the CLR; maybe, someday. Then I remember extensions methods. In C# 3.0, with a slight modification to the above methods, they can appear as if they were already part of the CLR.

public static class ListExtensionMethods {
    public static int BinarySearch<T, K>(this IList<T> list, K value,
        Converter<T, K> convert, Comparison<K> compare) {
        int i = 0;
        int j = list.Count - 1;
        while (i <= j) {
            int m = i + (j - i) / 2;
            int c = compare(convert(list[m]), value);
            if (c == 0)
                return m;
            if (c < 0)
                i = m + 1;
            else
                j = m - 1;
        }
        return ~i;
    }

    public static int BinarySearch<T, K>(this IList<T> list, K value,
        Converter<T, K> convert) where K : IComparable<K> {
        return list.BinarySearch(value, convert, (n, m) => n.CompareTo(m));
    }
}

My version of BinarySearch() now appears as part of the methods of any type that implements IList<T>. This allows us to write,

int location = textRangeList.BinarySearch(location, (n) => n.Location);

just like it was a native part of the CLR. No more hand-rolled BinarySearch()s!

Now that we have a ListExtensionMethods class, what other missing methods could we add? Here are a few that I thought would be useful,

public static void Sort<T, K>(this List<T> list,
    Converter<T, K> coverter, Comparison<K> compare) {
    list.Sort((n, m) => compare(convert(n), convert(m)));
}

public static void Sort<T, K>(this List<T> list,
    Converter<T, K>  converter) where K : IComparable<K> {
    list.Sort((n, m) => convert(n).CompareTo(convert(m)));
}

public static void Swap<T>(this IList<T> list, int index1,
    int index2) {
    var value = list[index1];
    list[index1] = list[index2];
    list[index2] = value;
}

public static void Shuffle<T>(this IList<T> list) {
    int count = list.Count;
    Random r = new Random();
    for(int i = count - 1; i > 1; i--)
        list.Swap(i, r.Next(i - 1));
}

The Sort() methods above are allow you to extract a key for sorting using the same methods as above. These are not strictly necessary since constructing the lambda function I use is not that complicated but I believe they make the code more readable if you are consistent in how you extract keys from a list, so these are convenient wrappers that allow you to use the same methods to sort and search. Swap() and Shuffle() are throw-ins. They are really only useful for demo code or, in my case, test code. I used them to test the Sort() method which I used to test the BinarySearch() method.



blog comments powered by Disqus