Logo




Subscribe:
RSS 2.0 | Atom 1.0
Categories:

Sign In


[Giagnocavo]Michael::Write()

# Wednesday, August 29, 2007
Practical Functional C# - Part IV – Think in [Result]Sets

Don't skip parts I to III to get you up to speed. I have $300 up for grabs if these articles are incorrect in stating that it will greatly improve most C# apps:

    Practical Functional C# - Part I 
    Practical Functional C# - Part II 
    Practical Functional C# - Part III - Loops are Evil

Tell the compiler what you want, rather than how to do it. That's a key concept that can take you very far. However, as imperative programmers, this can sometimes be a hard concept. We are so used to instructing the processor, step-by-step, how do to things, and then only after we're all done, making sure the end effect is to our liking. Functional programming helps us change this.

Consider the following task: Write a function that checks a username against a few disallowed characters "!@#$%^&*". The normal C# implementation looks like this (null checks removed):

static bool IsBadUserName(string userName)
{
    var badChars = "!@#$%^&*";
    foreach (char c in badChars) {
        if (userName.Contains(c)) return true
    }
    return false;
}

It's not horrible; there's only one branch, but it does require a bit of thought to sort out. Now let's take an approach where we deal with string as a set of characters to manipulate at once (requires C# 3.0 compiler):

static bool IsBadUserName2(string userName){ 
    var badChars = "!@#$%^&*";
    return userName
        .Intersect(badChars) 
        .Any();
}

We've reduced the function from a step-by-step loop into something that has only path. The bigger benefit is that there's no need to evaluate end conditions and so on for correctness: The code says exactly what it does. "Are there any bad characters in the user name?" We don't have to worry how it does what we asked, we just need to think in terms of what we want our result to be.

A common function in functional languages is called "map". Map takes a list of something and turns it into a list of something else. For instance, if we had a list of integers ("ourInts"), we could turn them into squares by saying "map ourInts by multiplying each value with itself". In C# (LINQ), they called map "Select". Here's a quick example:

var ourInts = new[] { 2, 5, 13 };
var squares = ourInts.Select(i => i * i);

Squares will contain the list { 4, 25, 169 }. What use is this? Well, it is an extremely common pattern to take some set of data, filter it, modify it a bit, and return a new set of data. Here's an example: You have a variable from containing semicolon-delimited email addresses. You want to turn these into an array of .NET's MailAddress objects to use with some other code. The loop isn't very pretty:

var tempAddresses = new List<MailAddress>();
foreach (string s in semicolonEmails.Split(';')) {
    var ts = s.Trim();
    if (ts == "") continue
    tempAddresses.Add(new MailAddress(ts)); 
}
var myAddresses = tempAddresses.ToArray();

But consider the functional approach:

var myAddresses = semicolonEmails
    .Split(';'
    .Select(s => s.Trim())
    .Where(s => s != ""
    .Select(s => new MailAddress(s))
    .ToArray();

In one statement, we transform the data three times, as well as add filter to remove empty items.

One place where LINQ falls flat on its face is when it comes to processing data. For some reason, there are no methods defined to do "ForEach" or "Process". (Even more interesting: List<T> does define these methods.) Process is a great pattern: on each item, it performs some action, then returns the original item. The code to define it is very simple and looks like this:

static IEnumerable<T> Process<T>(this IEnumerable<T> source, Action<T> f)
{
    foreach (var item in source) {
        f(item); 
        yield return item; 
    }
}

How is this of use? Well, let's chain together some functional and imperative processing. For instance, write a program that does some long process on all files passed in as arguments – on separate threads. If we take the purely imperative approach, our code looks like this:

static void Main(string[] args)
{
    var threads = new List<Thread>();
    foreach (var s in args) {
        Thread t = new Thread(startLongProcess); 
        t.Start(s); 
        threads.Add(t); 
    }
    foreach (var t in threads) {
        t.Join();
    }
}
static void startLongProcess(object data)
{
    SomeLongProcess((string)data);
}

Yes, we actually must declare a separate function just to invoke SomeLongProcess. Let's combine and use the functional approach now:

static void Main(string[] args)
{
    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());
}

Which way is going to be easier to edit and change around? I don't know about you, but for me, going from ~12 to ~5 lines, removing extra variables, useless functions and flow control structures: that's a hands-down win in my book.

As a side note, threading, in fact, is a space that is extremely ripe for functional styles. I'm willing to bet that ".NET 4" will include threading extensions that rely heavily on functional concepts. For instance, it's easy to create a method that allows us to replace the previous program with this:

args.Parallel(SomeLongProcess);

But I'll talk about that another day.

In the next article, I'm going to cover C#'s new inner functions capability and how that can be used to help build up more complicated function chains. I'd also like some feedback on which kinds of areas of C# programming you've run into that seem to require more code than necessary.

Code
Wednesday, August 29, 2007 10:45:59 AM UTC  #    Comments [4]  |  Trackback

Wednesday, August 29, 2007 2:37:35 PM UTC
These blog posts are incredibly useful. Seriously, your suggestions add beauty to boring code in ways that I didn't think were possible.

Keep up the good work, and write a book about functional programming for everyday tasks with C# 3.0, please :-)
Luís Pureza
Saturday, November 24, 2007 4:07:37 AM UTC
I have got to say that these articles are great. Anything that helps us reduce the number of lines of code while retaining clarity is great in my book. To me the areas that reguire the most code that gets in the way of what I'm trying to do is Validation.

For example take a look at the CreateUser function in SQLMembershipProvider.cs included in the ProviderToolkitSampleProviders http://download.microsoft.com/download/a/b/3/ab3c284b-dc9a-473d-b7e3-33bacfcc8e98/providertoolkitsamples.msi.

Look at the amount of validation done in that function. I'd like to see ways to use functional programming techniques to hide the validation and let the createuser function focus on creating the user
Friday, July 25, 2008 2:20:29 PM UTC
Hey Michael,

Great blog. I've been exploring a lot of the same kinds of functional programming aspects in C# 3.0 that you're writing about.

One thing people might be confused by is your use of .ToList() to force everything up to that point in the query expression to execute before moving onto the next set of steps. It might seem unintuitive that this code won't do anything:

args
.Select(s => new Thread(() => SomeLongProcess(s)))
.Process(t => t.Start())
.Process(t => t.Join());

The ToList call isn't very declarative, in the sense that it doesn't suggest the actual role of that method call in the expression.

The ForEach method of List<T> is so useful--it really was an oversight not to include it in IEnumerable<T>, so I created that extension method. But your Process function is even better, in the way that it uses yield return to return an IEnumerable<T>, and therefore allows you to continue chaining together expressions (List.ForEach, returning void, is a dead end). So here's my new ForEach method:

public static IEnumerable<T> ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var item in source)
{
action(item);
yield return item;
}
}

I called it ForEach instead of Process for two reasons. 1. I'm already familiar with List.ForEach, and it describes exactly how it's going to perform some action of each element in a collection. 2. The Process identifier is a perfect replacement for ToList to indicate declaratively that you want all previous parts of an expression to be evaluated before moving on.

public static IEnumerable<T> Process<T>(this IEnumerable<T> source)
{
return new List<T>(source);
}

As a result, we can now write code like this (with a lambda statement thrown in to illustrate how you can set up a background thread appropriately):

args
.Select(s => new Thread(() => SomeLongProcess(s)))
.ForEach(t =>
{
t.IsBackground = true;
t.Priority = ThreadPriority.BelowNormal;
t.Start();
})
.Process()
.ForEach(t => t.Join());

You could define multiple stages, each separated by a Process directive to ensure that one stage completes before the next begins.

I've come up with a pattern for spawning threads that encapsulates this even further, in a similar way that you've done with your WCF wrapper code. I wrote about this pattern on my blog at http://dvanderboom.wordpress.com/2008/06/23/new-spin-on-spawning-threads/).


BTW, your blog software has murdered the format of the code I've entered. I guess it doesn't believe in "significant whitespace". :(
Saturday, July 26, 2008 2:18:23 AM UTC
Hi Dan,

Sorry about the whitespace. Not sure if it's dasblog or my template (probably the template... I should add a <pre> somewhere).

The problem with calling that function ForEach is that if you DO use a List<T>, the instance method ForEach will take priority and won't work (another strike against extension methods).

But yes, good ideas in general; I agree ToList may not be the best name for the step.
OpenID
Please login with either your OpenID above, or your details below.
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):

Live Comment Preview