29 March 2008

"If you are using a loop, you're doing it wrong."

That is the advice one of my college professors told us when he was teaching us APL. APL was designed to perform vector and matrix operations. Programming in APL is an exercise in stringing operators together to perform strange and wonderful things. Using a loop is just wrong and slows things down.

It is similar with LINQ, if you are using a loop you are doing it wrong. I find myself doing a lot of prototyping lately and I am forcing myself to use LINQ; not because I don't like it, far from it, I really like LINQ, but using loops is so ingrained into my psyche that I have to stop myself and force myself to think in LINQ. Every time I am tempted to write a loop that involves a collection or an array I ask myself, could I use LINQ instead? Programmers with more of a database background seem to take to LINQ like a duck to water. They think in sets and vectors, I don't, but I am getting there.

For example sometimes I find myself needing to return an IEnumerable<T> when I have a List<T> but of a different T. This often happens when I want to keep internal implementation details private. My internal List<T> might have the actual class but I need to return an enumerable for some interface. Before I would simply write a loop using the C# 2.0's yield syntax,

List<Foo> foos = new List<Foo>();

...

public IEnumerable<IFoo> Foos {
    get {
        foreach (Foo f in foos)
            yield return f;
    }
}

This loop involves a collection, can I use LINQ? Sure! By using LINQ's Cast<T>() method this can be replaced with,

public IEnumerable<IFoo> Foos {
    get { return foos.Cast<IFoo>(); }
}

If you are trying to find if a list contains an object by some name, you could write a loop like,

public bool Contains(string value) {
    foreach (Foo foo in foos)
        if (foo.Name == value)
            return true;
    return false;
}

Using LINQ this might look like,

public bool Contains(string value) {
    return (from foo in foos where foo.Name == value select foo).Any();
}

A nice thing about LINQ is you can perform complicated queries in pieces. With deferred execution of enumerators, this is fairly efficient as well. This is really helpful in the debugger. In one chunk of code I was writing I needed to coalesce adjacent ranges that are marked with the same text value. Assume you have a structure called Range that looks like,

struct Range {
    public int Start;
    public int Length;
}

and another struct that labels the ranges with names,

struct NamedRange {
    public string Name;
    public Range Range;
}

Now lets have a routine that calculates the range information over some stream,

public IEnumerable<NamedRange> GetNamedRanges(Stream stream) {
    ...
}

Lets assume that name ranges are some things like "whtespace", "reserved", "identifier", "number" "string", etc. as you might expect to receive from a lexical scanner like found in this post.

What I want to do with these ranges is to convert the names into styles such as you might find referencing a CSS style sheet. So, in effect, I am mapping NameRange values to StyledRange values where StyleRange would look something like,

struct StyledRange {
    public Style Style;
    public Range Range;
}

Lets create a dictionary that maps range names to styles such as,

styleMap["number"] = new Style() { Name = "green" };
styleMap["reserved"] = new Style() { Name = "bold" };

I only wanted to highlight numbers and reserved words, for everything else I will use the default style,

defaultStyle = new Style { Name = "normal" };

We can translate our named ranges directly into styled ranges by using a LINQ query expression such as,

var ranges = GetNamedRanges(stream);
var styledRanges = from range in ranges
                   select new StyledRange() {
                       Style = styleMap.MapOrDefault(range.Name)
                       Range = range.Range
                   };

where MapOrDefault() is an extension method for IDictionary<TKey, TValue> that looks like,

public static TValue MapOrDefault<TKey, TValue>(
      this IDictionary<TKey, TValue> dictionary, TKey key) {
    TValue result;
    if (dictionary.TryGetValue(key, out result))
        return result;
    return default(TValue);
}

which is patterned after the existing LINQ methods for IEnumerable<T>, FirstOrDefault() and LastOrDefault().

Since many of the ranges that have different names will have the same style, it would be nice to coalesce adjacent styles together so no two adjacent ranges have the same style. In other words, we only want a new styled range when the style changes. The above query expression just produces a one-to-one mapping of named range to styled range. What we need is something that will merge adjacent ranges. Do do this I will introduce another extension method, Reduce<T>() for IEnumerable<T>,

public static IEnumerable<T> Reduce<T>(this IEnumerable<T> e,
    Func<T, T, bool> match,
    Func<T, T, T> reduce) {
    var en = e.GetEnumerator();
    T last;
    if (en.MoveNext()) {
        last = en.Current;
        while (en.MoveNext()) {
            if (!match(last, en.Current)) {
                yield return last;
                last = en.Current;
            }
            else
                last = reduce(last, en.Current);
        }
        yield return last;
    }
}

What this method does is if two adjacent elements match (as defined by the match delegate returning true) they will be reduced into one element by calling the reduce delegate. For example, the Sum<T>()standard extension method could be implemented using Reduce<T>() as,

var sum = numbers.Reduce((a, b) => true, (a, b) => a + b).First();

Now that we have Reduce<T>(), lets reduce the list of styled ranges to coalesce the adjacent ranges with the same style. This can be done by,

styledRanges = styledRanges.Reduce(
    (r1, r2) => r1.Style == r2.Style,
    (r1, r2) => new StyledRange() {
        Style = r1.Style,
        Range = MergeRanges(r1.Range, r2.Range)});

MergeRanges() referenced above, is,

Range MergeRanges(Range r1, Range r2) {
    return new Range() { Start = r1.Start, Length = r2.Start - r1.Start + r2.Length };
}

In my example, this took 18 ranges for a typical line of C# source down to 7 ranges. Pretty good. But I noticed that some of those ranges were styling whitespace as "normal". This seems like a waste; why switch back from green to black text for writing whitespace? Why not combine those with adjacent ranges instead? A simple approach to this is to add the following code prior to mapping the styles,

ranges = ranges.Reduce(
    (r1, r2) => r1.Name == "whitespace" || r2.Name == "whitespace",
    (r1, r2) => new NamedRange() {
        Name = r1.Name == "whitespace" ? r2.Name : r1.Name,
        Range = MergeRanges(r1.Range, r2.Range)
    });

This says to merge whitespace ranges with adjacent non-whitespace ranges. This reduces the range count from 7 to 4. Not bad from the original 18 and that was just for one line of source. This savings adds up quickly over an entire file.

The complete example, made as a function, looks like,

IEnumerable<StyledRange> StyleRanges(IEnumerable<NamedRange> ranges) {

    // Merge whitespace ranges with adjacent non-whitespace ranges
    ranges = ranges.Reduce(
        (r1, r2) => r1.Name == "whitespace" || r2.Name == "whitespace",
        (r1, r2) => new NamedRange() {
            Name = r1.Name == "whitespace" ? r2.Name : r1.Name,
            Range = MergeRanges(r1.Range, r2.Range)});

    // Map named ranges to styles.
    var styledRanges = from range in ranges
                       select new StyledRange() {
                           Style = styleMap.MapOrDefault(range.Name)
                           Range = range.Range };

    // Merge adjacent ranges with the same style.
    styledRanges = styledRanges.Reduce(
        (r1, r2) => r1.Style == r2.Style,
        (r1, r2) => new StyledRange() {
            Style = r1.Style,
            Range = MergeRanges(r1.Range, r2.Range)});

    return styledRanges;
}

There are a few nice things about this function. First, it builds up an IEnumerable<T> but this enumerable doesn't execute until one of its enumerators is enumerated. This is due to the deferred execution of enumerators. Second, even though deferred execution makes single step debugging challenging, you can get a picture of what the function will do by using ToArray() in the watch windows on the intermediate results. This allows you to inspect the intermediate result to see if the mapping or reducing is what you expected. Third, this routine has no loops. It operates on each enumerable as a set. Now some of you will rightly say there is a loop buried in the Reduce<T>() method. True; Reduce<T>() is just like many other LINQ extension methods, they contain loops implied by the returned enumerator. But LINQ allows me to write code that communicates what I am trying to do without it being obscured by the details of how it is done. I think of LINQ not so much as a set of features in C# but a way of programming. A way of programming that, if you are using loops, you are doing it wrong.



blog comments powered by Disqus