Be sure to read these first two articles:
Practical Functional C# - Part I
Practical Functional C# - Part II
OK let's start with a quick challenge. Write an accurate description of the following program, in English:
static void Main(string[] args){ string output = ""; if (args.Length > 0) output = args[0]; if (args.Length > 1) { for (int i = 1; i < args.Length; i++) { output += ", " + args[i]; } } Console.WriteLine(output); }
Well? The specification might be something like "a program that writes its arguments separated by a comma and space". But how quickly could you determine that from the code? How much additional time does it take to determine there aren't any bugs? Every time a maintenance developer comes across this code, they have to analyze this code, determine the boundary conditions correctly, verify the indexing, and so on. Every time someone reads this code, she must pay a high tax. The saddest part is that this is an extremely common pattern.
Edit: As Chad Hower pointed out in the comments that you can remove the two if statements by performing a check inside the loop. That shortens it considerably and reduces some of the "tax" that has to be paid when you read it.
"Take this set of values and aggregate them into a single value". How many pieces of code do exactly this, but obscure it behind a for loop? Functional languages realize this and provide functions called "Fold" or "Reduce". Such functions take an accumulator function, apply it to every element in the list, then return the accumulator value when finished. C# 3.0, courtesy of LINQ, provides an equivalent function, called "Aggregate":
static void Main(string[] args){ string output = args.DefaultIfEmpty("").Aggregate( (accum, item) => accum += ", " + item); Console.WriteLine(output); }
Here we are saying that we are going to aggregate args into a single string value. The lambda on the second line takes two arguments. The first is the accumulator and Aggregate passes it to each element ("threads" it through). The second parameter is the current item we are working with. The return value of our lambda is simply the concatenation of the current accumulated value, comma and space, and the current item. This return value becomes the accumulator for the next item. On the first execution, since we did not give it an explicit seed value, it just uses the first item. The final return value becomes the value that Aggregate returns to output. (Edited: Added DefaultIfEmpty -- this overload of Aggregate doesn't work on empty sequences.) Using the lambda provides nicer syntax than this equivalent code:
static void Main(string[] args) { string output = args.DefaultIfEmpty("").Aggregate(joinCommaSpace); Console.WriteLine(output); } static string joinCommaSpace(string a, string b) { return a + ", " + b; }
So why is this a good thing? Well, it goes back to the questions about the first set of code: how much effort is required to determine intent and correctness of a particular piece of code? In the imperative way, we need eight lines of code, with three distinct paths. Using a functional approach, we have three lines of code and only one code path. The only serious objection that I've heard is that this code is "unfamiliar". Well, sure, anything new might be unfamiliar, but that does not make it bad.
You wouldn't write your SQL code to make someone only familiar with C "comfortable" or "familiar", would you? You wouldn't write in C# the same way you'd write in C or BASIC. So why stick with outdated programming practices just because some "new" developer might get confused? Functional code is more concise, less error prone, and much more readable. Learning this style might take a couple of days, but it's an invaluable skill. At any rate, LINQ is built upon these concepts, so it will do people good to learn anyways.
Now, let's examine how to create our own functions to hide loops. This time, we're going to look at data access. A very common pattern in data access is creating a SqlDataReader, going through it, adding elements to a list. Like other patterns, overhead obscures the intent:
public static List<Person> GetAllPeople(){ // Setup command var comm = new SqlCommand("GetAllPeople"); comm.CommandType = CommandType.StoredProcedure; // Setup connection using (var conn = new SqlConnection(Settings.ConnectionString)) { comm.Connection = conn; conn.Open(); // Loop and add people to our list using (var reader = comm.ExecuteReader()) { var people = new List<Person>(); while (reader.Read()) { var p = new Person(); p.Name = reader.GetString(0); p.Age = reader.GetInt32(1); people.Add(p); } // Done return people; } }}
Yes, something as simple as reading a list of two-field type can take 14 lines of code. Let's do something about that. The only unique part is where we create a Person from the SqlDataReader. Outside of that, it's a very straightforward, but large, pattern. Refactoring the pattern, we get this:
public static List<T> ListFromReader<T>(string connectionString, SqlCommand command, Func<SqlDataReader, T> code) { // Setup connection using (var conn = new SqlConnection(connectionString)) { command.Connection = conn; conn.Open(); // Loop into list using (var reader = command.ExecuteReader()) { var list = new List<T>(); while (reader.Read()) { // Here we call the supplied code to add the right item T item = code(reader); list.Add(item); } return list; } }}
Now our code to GetAllPeople is very simple:
public static List<Person> GetAllPeople2(){ var comm = new SqlCommand("GetAllPeople"); comm.CommandType = CommandType.StoredProcedure; return ListFromReader(Settings.ConnectionString, comm, reader => new Person { Name = reader.GetString(0), Age = reader.GetInt32(1) });}
Just be looking at this code, we know all the data-API stuff is handled correctly. This approach is vastly superior to other common SQL "helpers", for example, an ExecuteReader method that gives us a SqlDataReader. First, we have no locals that need disposing or other cleanup: this function scopes the variables to our lambda. Second, we can focus on our actual logic (creating a Person) rather than dealing with loop conditions.
Edit: This is not specific to C# 3.0! You can definately achieve a lot of the same benefits in 2.0, except you need to replace the simple lambda syntax ( => ) with the much more verbose delegate (ArgType arg) { } (anonymous method) syntax. C# 3.0 just makes it much easier to write.
What other common loops do you encounter? I have a few more we'll cover in the next article. As always, comments, insults and suggestions are welcome.
Remember Me