13 August 2005

As we noted in Part I, a selection sort is an easy sort to write but unfortunately it performs on the order of n2 comparisons. To produce an improved version we need to find a piece of information that the selection sort is throwing away and take advantage of it.

Selection sort will compare a value of the array with all of the other values of the array to determine which is the smallest. It doesn’t take advantage to the fact that comparing say, a10 to a20 and comparing a20 to, say, a40 could tell us something about the relationship between a10 and a40. What we could do is take some element of a and compare it to every other element in a and partition the array into two pieces, one containing all the elements less than the selected element and all elements greater than or equal to the selected element. By transitivity, we would know that each element in the first partition is less than all elements in the second partition even if we don’t compare them directly. We could then partition the partitions again and again, in Xeno-esque fashion, until the array only contains partitions of one element in length. Since each partition will be less than its successor, the array will be sorted.

An algorithm that takes this very approach is the QuickSort, invented by C.A.R. Hoare. A version of which looks like the following,

public class Sorts {

  public static void QuickSort(int[] a) {
    QSort(a, 0, a.Length - 1);
  }

  private static void QSort(int[] a, int l, int r) {
    // Partition the array into two parts, one with all values
    // greater than a certain value in the array, one with all
    // less.
    int v = a[(l + r) / 2];
    int i = l;
    int j = r;
    while (i < j) {
      // Find the first values that don't belong and swap
      // them. Repeat until all values are in the right
      // partitions.
      while (a[i] < v)
        i++;
      while (a[j] > v)
        j--;
      if (i if (j > i)
        Swap(a, i, j);
      i++;
      j--;
    }
  }

  // Recursively partition the array until we can't any
  // longer. When this happens, the array is sorted.
  if (j > l)
    QSort(a, l, j);
  if (r > i)
    QSort(a, i, r);
}

If we get lucky and always guess the median value of each of the partitions we will only perform O(N log N) comparisons. The worst thing we can do is select a value that, given an n length partition will give us two partitions one of length n-1 and the other of length 1. This would put us back where we started; with the selection sort. This would happen if we selected the first (or last) element in the partition and the array was already sorted. To avoid this, the above code selects an element from the middle of the partition, so, if the array is sorted, it will be the median. If the array is in random order, it is not better or worse than the first element.

This, however, doesn’t guarantee we will get the elusive O(N log N) performance. Many cases we will be less than O(N log N), those cases where we don’t magically pick the median. In some pathological cases, we could arrive very close to O(N2). An example of which is an array ordered 1..m,1..m where m is n/2. Next we will examine an algorithm that guarantees O(N log N) performance.



blog comments powered by Disqus