Logo




Subscribe:
RSS 2.0 | Atom 1.0
Categories:

Sign In


[Giagnocavo]Michael::Write()

 Thursday, August 16, 2007
Practical Functional C# - Part III – Loops are Evil

Be sure to read these first two articles:

    Practical Functional C# - Part I

    Practical Functional C# - Part II

OK let's start with a quick challenge. Write an accurate description of the following program, in English:

static void Main(string[] args)
{
    string output = "";
    if (args.Length > 0) output = args[0]; 
    if (args.Length > 1) {
        for (int i = 1; i < args.Length; i++) {
            output += ", " + args[i]; 
        }
    }
    Console.WriteLine(output);
}

Well? The specification might be something like "a program that writes its arguments separated by a comma and space". But how quickly could you determine that from the code? How much additional time does it take to determine there aren't any bugs? Every time a maintenance developer comes across this code, they have to analyze this code, determine the boundary conditions correctly, verify the indexing, and so on. Every time someone reads this code, she must pay a high tax. The saddest part is that this is an extremely common pattern.

Edit: As Chad Hower pointed out in the comments that you can remove the two if statements by performing a check inside the loop. That shortens it considerably and reduces some of the "tax" that has to be paid when you read it.

"Take this set of values and aggregate them into a single value". How many pieces of code do exactly this, but obscure it behind a for loop? Functional languages realize this and provide functions called "Fold" or "Reduce". Such functions take an accumulator function, apply it to every element in the list, then return the accumulator value when finished. C# 3.0, courtesy of LINQ, provides an equivalent function, called "Aggregate":

static void Main(string[] args)
{
    string output = args.DefaultIfEmpty("").Aggregate(
        (accum, item) => accum += ", " + item); 
    Console.WriteLine(output);
}

Here we are saying that we are going to aggregate args into a single string value. The lambda on the second line takes two arguments. The first is the accumulator and Aggregate passes it to each element ("threads" it through). The second parameter is the current item we are working with. The return value of our lambda is simply the concatenation of the current accumulated value, comma and space, and the current item. This return value becomes the accumulator for the next item. On the first execution, since we did not give it an explicit seed value, it just uses the first item. The final return value becomes the value that Aggregate returns to output. (Edited: Added DefaultIfEmpty -- this overload of Aggregate doesn't work on empty sequences.) Using the lambda provides nicer syntax than this equivalent code:

static void Main(string[] args)

    string output = args.DefaultIfEmpty("").Aggregate(joinCommaSpace); 
    Console.WriteLine(output);
}
static string joinCommaSpace(string a, string b)

    return a + ", " + b;
}

So why is this a good thing? Well, it goes back to the questions about the first set of code: how much effort is required to determine intent and correctness of a particular piece of code? In the imperative way, we need eight lines of code, with three distinct paths. Using a functional approach, we have three lines of code and only one code path. The only serious objection that I've heard is that this code is "unfamiliar". Well, sure, anything new might be unfamiliar, but that does not make it bad.

You wouldn't write your SQL code to make someone only familiar with C "comfortable" or "familiar", would you? You wouldn't write in C# the same way you'd write in C or BASIC. So why stick with outdated programming practices just because some "new" developer might get confused? Functional code is more concise, less error prone, and much more readable. Learning this style might take a couple of days, but it's an invaluable skill. At any rate, LINQ is built upon these concepts, so it will do people good to learn anyways.

Now, let's examine how to create our own functions to hide loops. This time, we're going to look at data access. A very common pattern in data access is creating a SqlDataReader, going through it, adding elements to a list. Like other patterns, overhead obscures the intent:

public static List<Person> GetAllPeople()
{
    // Setup command
    var comm = new SqlCommand("GetAllPeople");
    comm.CommandType = CommandType.StoredProcedure;

    // Setup connection
    using (var conn = new SqlConnection(Settings.ConnectionString)) {
        comm.Connection = conn; 
        conn.Open();

        // Loop and add people to our list
        using (var reader = comm.ExecuteReader()) {
            var people = new List<Person>();
            while (reader.Read()) {
                var p = new Person();
                p.Name = reader.GetString(0); 
                p.Age = reader.GetInt32(1); 
                people.Add(p); 
            }
            // Done
            return people; 
        }
    }
}

Yes, something as simple as reading a list of two-field type can take 14 lines of code. Let's do something about that. The only unique part is where we create a Person from the SqlDataReader. Outside of that, it's a very straightforward, but large, pattern. Refactoring the pattern, we get this:

public static List<T> ListFromReader<T>(string connectionString,
SqlCommand command, Func<SqlDataReader, T> code)
{
    // Setup connection
    using (var conn = new SqlConnection(connectionString)) {
        command.Connection = conn; 
        conn.Open();

        // Loop into list
        using (var reader = command.ExecuteReader()) {
            var list = new List<T>();
            while (reader.Read()) {
                // Here we call the supplied code to add the right item
                T item = code(reader); 
                list.Add(item); 
            }
            return list; 
        }
    }
}

Now our code to GetAllPeople is very simple:

public static List<Person> GetAllPeople2()
{
    var comm = new SqlCommand("GetAllPeople");
    comm.CommandType = CommandType.StoredProcedure;

    return ListFromReader(Settings.ConnectionString, comm, 
        reader => new Person {
            Name = reader.GetString(0), 
            Age = reader.GetInt32(1) 
        });
}

Just be looking at this code, we know all the data-API stuff is handled correctly. This approach is vastly superior to other common SQL "helpers", for example, an ExecuteReader method that gives us a SqlDataReader. First, we have no locals that need disposing or other cleanup: this function scopes the variables to our lambda. Second, we can focus on our actual logic (creating a Person) rather than dealing with loop conditions.

Edit: This is not specific to C# 3.0! You can definately achieve a lot of the same benefits in 2.0, except you need to replace the simple lambda syntax ( => ) with the much more verbose delegate (ArgType arg) { } (anonymous method) syntax. C# 3.0 just makes it much easier to write.

What other common loops do you encounter? I have a few more we'll cover in the next article. As always, comments, insults and suggestions are welcome.

Code
Thursday, August 16, 2007 4:36:01 PM UTC  #    Comments [21]  |  Trackback

Thursday, August 16, 2007 7:39:14 PM UTC
You cheated in your first example: most of the confusion in the initial code was to deal with not accidentally prepending ", " to the joined string. But your aggregate function *does* have the extra ", ". Can you write the actual desired functionality using an aggregate, and make it easier than the original example? I'm not having much luck finding a way.

(Of course, in my personal library I long ago wrote a join(separator, list) function that does exactly this, and I use it all the time. With extension functions, you could even make this appear as part of the string list, so the syntax would be a nice list.join(separator).)
apenwarr
Thursday, August 16, 2007 8:26:47 PM UTC
Nice examples! I like the look of the functional stuff ;)

apenwarr: actually, if you have a look at how the Aggregate function works, it does handle the first item in the list differently, resulting in correct behaviour: http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/System.Core/System.Linq/Enumerable.cs?view=markup (though AFAICT that implementation is missing a "counter++", and could be written more efficiently by only executing the "if" once, like the first example here does).
Thursday, August 16, 2007 10:54:35 PM UTC
apwnwarr: As Michael H pointed out, that overload of Aggregate (without supplying a seed value) uses the first element as the seed. There's still one bug in that empty sequences are not allowed for that overload of Aggregate. This is fixed by adding in a "DefaultIfEmpty" (fixed in the article).

You're right, this specific implementation of Aggregate can be done via a join function (the System.String.Join static method does what you describe). The point is that we don't need these very specific uses baked into the framework. Aggregate and other functions are generic, so they can operate on _any_ type, not just string. The manipulation isn't limited to addition:

var nums = new[] { 5f, 4f, 3f, 2f };
var output = nums.Aggregate((acc, i) => acc /= i);
Console.WriteLine(output);

Friday, August 17, 2007 12:43:56 AM UTC
Beautiful. Time and time again when dealing with database-related code I end up wanting to scream "Why does this have to be so complicated?!"...

It seems C# 3 is finally going to bring me to a point where I won't be going mental the next time I have to read stuff from a database. :)

Thanks for these articles.
Leonardo
Friday, August 17, 2007 6:54:17 AM UTC
> What other common loops do you encounter?

I encounter comma separating loops like this all the time. The last time was yesterday - and indeed the existing implementation was buggy (it tried to remove a trailing ',' even when there were zero items in the list).

However, I always replace these sort of lists with String.Join(arr, ', '). In the case yesterday, I built up a List<String> and then do a ToArray() on it to get something that String.Join will work with.

Nevertheless, I understand (and practice) your generalized point.
Friday, August 17, 2007 10:59:56 AM UTC
I just want to say that I love this series of articles! It's probably the best .NET articles I've ever read because of their clarity, nice syntax highlighting in the exmples and your good writing skills. I really hope you can follow up with more interesting topics in the future, written in the same style and format. Thanks for sharing this with the .NET community!

On the topic of C# 3.0, it's going to be so good. I hoped they would implement it more in the style they chose in C Omega, but it's still so much better than C# 2.0 that I can't even think of complaining. Linq and functional programming with lambda's and anonymous objects is going to be such a huge time-saver in data retrieving scenarios and will at the same time make the code easier to read, understand and debug.

PS: Can I make a little nit about how the comments are presented? Currently they intertwine a bit because there's nothing visually differentiating each comment from another. Some more margins on the bottom of each comment coupled with some borders and/or background coloring would do the trick.
Friday, August 17, 2007 6:29:21 PM UTC
very interesting and inspiring! thanks for sharing!
zulu
Friday, August 17, 2007 6:55:49 PM UTC
Oh, I can still think of complaining about the poor type inference C# 3.0 has :) But yes, it's way nicer than 2.0 so some somewhat satisfied. Probably the biggest annoyance is that C# 3.0 defines all these query keywords (select, where, from, etc.) but doesn't provide any way for us to make our own words. They should have fixed the language to allow such things to be declared in code... oh well.

Thanks for mentioning the comments: they should look a lot nicer now!
Friday, August 17, 2007 10:36:49 PM UTC
Yea, C# could definately have better type inference. About being able to def your own keywords; I don't know. You can stretch that pretty far in the wrong direction if you write code that show up on <a href="http://worsethanfailure.com/">WTF</a> on a regularly basis. ;-)

Anyway, the comments look much better now. Great work, great blog, great discussion!
Friday, August 17, 2007 10:37:53 PM UTC
Hey, the comment preview showed that link as a link and not as escaped markup! It tricked me! :-O
Tuesday, August 21, 2007 5:49:46 PM UTC
I enjoyed this series of articles. If you're interested in more functional programming in C#, check out my open source FP# project [1]. I've defined map, filter and fold functions over the System.Collections classes, and I provide implementations of other functional idioms (purely functional lists, etc.). I haven't released binaries yet, but the repository code builds and each functional idiom compiles to its own independently usable dll, ie. FP.List.dll, FP.Option.dll, etc., so it's not a large monolithic project. If there is interest, I will release binaries dlls of the stable code, but I'm still experimenting with other functional idioms (monads, etc.) so those won't be released any time soon.

[1] http://sourceforge.net/projects/fpsharp
Friday, August 31, 2007 12:32:00 AM UTC
Your first one is a bit long:

static void Main(string[] aArgs) {
string xOutput = "";
for (int i = 0; i < aArgs.Length; i++) {
xOutput += ((i > 0) ? ", " : "") + aArgs[i];
}
Console.WriteLine(xOutput);
}
Friday, August 31, 2007 2:23:57 AM UTC
Yes, another way to approach it is to do a check on each iteration. Doesn't help the imperative approach too much, though.
Friday, August 31, 2007 10:15:46 AM UTC
Hmm, I also posted a foreach variant which is even better, but it apepars to have disappeared?
Friday, August 31, 2007 10:17:09 AM UTC
Its not about being functional etc - if you show the for loop as is, it makes it look not only different but quite bit larger and more complex. It makes the comparison less proper than it should be.
Friday, August 31, 2007 10:19:46 AM UTC
Since I cant find the foreach one, here it is again:

static void Main(string[] aArgs) {
string xOutput = "";
foreach (var s in aArgs) {
xOutput += ((xOutput != "") ? ", " : "") + s;
}
Console.WriteLine(xOutput);
}
Friday, August 31, 2007 2:02:33 PM UTC
Well, actually, Aggregate does not check for initial values on each iteration. So, assigning out of the first element in the array and going from there is the most similar in terms of functionality.

But yes, it does make it more complex and since we only care about the result, probably a better comparison. I'll make a note of this in the article.
Tuesday, March 11, 2008 10:11:42 PM UTC
OMG People.. First example!

static void Main(string[] args)
{
Console.WriteLine(String.Join(", ", args));
}

I used that in .NET 2 for a long time already!

Even though I see sense in this series of articles, across the first three there's no much use.. you just defined the use of Func<>'s and lambdas.. And this example, as well as the one used in the previous can be easily rewritten as something a lot shorted and easyer.. and standard.. =/

No hard feelings :)
Wednesday, March 12, 2008 12:01:22 AM UTC
Gregory, you're right. For the specific case of having a string[] and going to a string, the Join function already has it built-in. But you can always make specific cases that have a single purpose useful solution. It doesn't work if you have, say, an enumerable of ints, or so on. You'd have to project to a string, then array them first.

I'm interested in the other cases where not using lambas becomes "shorter and easier".
Friday, July 18, 2008 2:34:40 PM UTC
Just a side comment about ListFromReader<T>() -- it really shouldn't return a List<T>, it should instead use iterators and return an IEnumerable<T>:

public static IEnumerable<T> GetItems<T>(string connectionString, SqlCommand command, Func<SqlDataReader, T> code) {
// Setup connection
using (var conn = new SqlConnection(connectionString)) {
command.Connection = conn;
conn.Open();
using (var reader = command.ExecuteReader()) {
while (reader.Read()) {
yield return code(reader);
}
}
}
}

The reason for this is memory usage -- List<T> can't handle large result sets (think millions of elements). If a List<T> is truly needed, there's always the C# 3.0 .ToList() extension method or the List<T>() constructor; either way, this function shouldn't require List<T> as the collection type.
Friday, July 18, 2008 6:17:05 PM UTC
Jonathan: You're completely correct. I just didn't want to introduce iterators on top of the functional bits.
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):

Live Comment Preview