Logo




Subscribe:
RSS 2.0 | Atom 1.0
Categories:

Sign In


[Giagnocavo]Michael::Write()

 Thursday, September 06, 2007
Complicated functions in LINQ to SQL

Rob Conery talks about geocoding with LINQ here. In his post, he provides some code using the Haversine formula to compute the distance between two points on earth. His function is declared as:

Func<double, double, double, double, double> CalcDistance = (lat1, lon1, lat2, lon2) => …

Further, this delegate is wrapped in a normal C# method. Now, read my previous post about calling functions in LINQ to SQL. Can you see where things are going to go wrong? That's right, a normal delegate or method can't be converted into SQL by LINQ, as the engine has nothing to work with. Only when our code is available as data can the LINQ to SQL engine do it's magic. Since there's some insinuation that LINQ to SQL just can't handle things like Haversine, I'll demonstrate how to do it.

If you want to use your own "complicated" functions with LINQ to SQL, you'll need to manually construct the predicate expression. It's not pretty, but, it does let you convert somewhat detailed functions, such as Haversine, inside of LINQ-to-SQL queries. This is not always the right approach: in some cases it'll be better to use a UDF or stored procedure. (If someone knows a native, better way, please let me know! TomasP's Expandable stuff looks cool, but I ran into some bugs on Beta 2. This is really something the compiler should help out with!).

To start off, we need to declare our function as an Expression Tree (I cannot vouch for the accuracy of this code; I'm merely demonstrating LINQ to SQL technique. For the geocoding details, read Rob's post.):

const double R = 6367;
const double RAD = Math.PI / 180;
static Expression<Func<double, double, double, double, double>> dist =
(lat1, lon1, lat2, lon2) =>
    R * 2 *
    (
        Math.Asin(Math.Min(1,Math.Sqrt(
            (
            Math.Pow(Math.Sin(((lat1 * RAD - lat2 * RAD)) / 2.0), 2.0) +
            Math.Cos(lat1 * RAD) * Math.Cos(lat2 * RAD) *
            Math.Pow(Math.Sin(((lon1 * RAD - lon2 * RAD)) / 2.0), 2.0) 
           
        ))) 
    );

OK, that was the easy part. We just had to wrap Expression<> around the type declaration and remove the method calls. But how do we pass this to our query? The Where method has this signature:

public static IQueryable<TSource> Where<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate);

Hence, we need to provide a predicate of Expression<Func<TSource, bool>> for it do its magic. To work with expressions, we need to start using the System.Linq.Expressions namespace. For more clarity, I aliased E to System.Linq.Expressions.Expression. Building expressions isn't particularly hard, it's just annoying and time consuming. C#'s lack of "symbolics", or a way to use the compiler to create expressions to reference properties, etc. means we have to use strings to do so. I never did claim it'd be pretty.

We start off with some data from elsewhere in our application. In my example, I'm just going to declare some locals:

double coolLat = 43.641852;
double coolLong = -79.387298;
double maxDist = 100;

The first Expression we need is a ParameterExpression to refer to the source item from the table (i.e., the parameter the Where method is going to give to us). In my code, my table is called Accounts, hence my object type is Account. To create the parameter expression:

var acctParam = E.Parameter(typeof(Account), "a");

Next, we need to be able to reference the fields in the table. With LINQ to SQL, these are properties on our object. We create them like this:

var acctLon = E.Property(acctParam, "Longitude");
var acctLat = E.Property(acctParam, "Latitude");

The secret sauce is creating the invoke to our dist expression. This is where all the work comes into play and includes all our complicated code in the LINQ to SQL query. Fortunately, after building up our arguments separately, it's not that hard:

var distCalc = E.Invoke(dist, E.Constant(coolLat), E.Constant(coolLong), acctLat, acctLon);

We have to use ConstantExpressions for our local variables. Using constant allows us to capture the value of those variables. Now we're ready to finish off, by adding a less than comparison and turning it all into a <TSource, bool> LambdaExpression:

var maxComp = E.LessThan(distCalc, E.Constant(maxDist));
var pred = E.Lambda<Func<Account, bool>>(maxComp, acctParam);

Our query can now look like this:

MembersDataContext dc = new MembersDataContext();
var q = dc.Accounts.Where(pred);
Console.WriteLine(q.ToString());

When we run it, we see that LINQ to SQL is quite capable of handling our little bit of math:

exec sp_executesql N'SELECT [t0].[AccountId], [t0].[Latitude], [t0].[Longitude]
FROM [dbo].[Accounts] AS [t0]
WHERE (@p0 * ASIN(
(CASE
WHEN @p1 < SQRT(POWER(SIN(((@p2 * @p3) - ([t0].[Latitude] * @p4)) / @p5), @p6) + (COS(@p7 * @p8) * COS([t0].[Latitude] * @p9) *
   
POWER(SIN(((@p10 * @p11) - ([t0].[Longitude] * @p12)) / @p13), @p14))) THEN @p1
ELSE SQRT(POWER(SIN(((@p2 * @p3) - ([t0].[Latitude] * @p4)) / @p5), @p6) + (COS(@p7 * @p8) * COS([t0].[Latitude] * @p9) *
    POWER(SIN(((@p10 * @p11) - ([t0].[Longitude] * @p12)) / @p13), @p14)))

END))) < @p15'
,N'@p0 float,@p1 float,@p2 float,@p3 float,@p4 float,@p5 float,@p6 float,@p7 float,@p8 float,@p9 float,
    @p10 float,@p11 float,@p12 float,@p13 float,@p14 float,@p15 float'
,@p0=12742,@p1=1,@p2=95.412000000000006,@p3=0.017453292519943295,
@p4=0.017453292519943295,@p5=2,@p6=2,@p7=95.412000000000006,@p8=0.017453292519943295,@p9=0.017453292519943295,@p10=102.63200000000001,
@p11=0.017453292519943295,@p12=0.017453292519943295,@p13=2,@p14=2,@p15=100

I still think there should be some kind of syntax so we could write it like we want to: (Where(a=>dist(coolLat, coolLong, a.Latitude, a.Longitude) > 10)). If anyone knows a built-in way to do it, please let me know. I'm sure it's just something simple I'm overlooking...

Code
Thursday, September 06, 2007 1:56:54 PM UTC  #    Comments [4]  |  Trackback

 Wednesday, September 05, 2007
Calling custom methods in LINQ-to-SQL

This was sparked by the issues raised by Rob Conery here. Basically, if you have some semi-complicated function that you need to apply to a LINQ-to-SQL query, how can you do it? This is somewhat covered by TomasP.NET: Building LINQ Queries at Runtime in C# and Joseph Albahari: Dynamically building LINQ expression predicates. I recommend those two articles; they're very good. I'm going to write a bit about it too. Some of it might be redundant, some of the ideas I took from those articles. The code is all mine.

So, let's say we want to get all accounts where the square root of the account ID is even. This will serve as our placeholder for a totally contrived example. Just calling our method in our LINQ query won't work because the LINQ-to-SQL code isn't going to know what to do with it. A method is just an opaque block of code with a name. Here's an example:

static void Main(string[] args) {
    MembersDataContext dc = new MembersDataContext();
    var q = dc.Accounts.Where(a => IsRightAccount(a));
    Console.WriteLine(q.ToString());
}
static bool IsRightAccount(Account a) {
    return Math.Sqrt(a.AccountId) % 2 == 0;
}

This code crashes with: Unhandled Exception: System.NotSupportedException: Method 'Boolean IsRightAccount(ConsoleApplication1.Account)' has no supported translation to SQL. Which should be expected, as LINQ to SQL cannot know what goes on inside that method and thus can't translate it.

Let's change things to make IsRightAccount be a Func delegate (from a lambda expression):

static Func<Account, bool> IsRightAccount = a => Math.Sqrt(a.AccountId) % 2 == 0;

Now we get: Unhandled Exception: System.NotImplementedException: The method or operation is not implemented. At System.Data.Linq.SqlClient.QueryConverter.VisitInvocation(InvocationExpression invoke). That's a somewhat strange exception, as I'd expect it to be a bit more helpful. At any rate, I'd expect it to crash, because that is just a delegate to a method. It's still an opaque block of code.

Enter the magic Tree of Expressions: Expression<T>. As I mentioned in my last post, Expression Trees provide "introspection" (reflection against code). In the examples above, the lambda that calls IsRightAccount for the Where clause actually turned into an InvocationExpression that represents a call to the delegate provided. Hence me saying that it is "opaque". What we need is to make sure that our code (our IsRightAccount calculation) is visible as data. When it's visible as data, then LINQ-to-SQL can go and say "Oh, you want to take the square root of the account ID, mod it by 2, and see if that's zero… now THAT I can do in SQL".

Declaring an Expression Tree is really simple. First, make sure you import System.Linq.Expressions if you don't want to fully qualify the name. Then, declare your tree just like any lambda Func, except this time make the type Expression<MyFunc>:

static Expression<Func<Account, bool>> IsRightAccount = a => Math.Sqrt(a.AccountId) % 2 == 0;

We will also change our Where clause to accommodate the fact that we are not calling a method:

var q = dc.Accounts.Where(IsRightAccount);

And presto! Our program now shows a SQL conversion:

SELECT [t0].[AccountId], [t0].[Email]
FROM [dbo].[Accounts] AS [t0]
WHERE (SQRT(CONVERT(Float,[t0].[AccountId])) % @p0) = @p1

Things get a bit more complicated when we try to stack expressions together. As far as I know, we must create the Expression manually (using the Expression static methods); the compiler won't help out. If anyone knows a built-in way around this, please let me know. Otherwise, see the links at the top of this article for more information and other workarounds.

This also applies if you have complex logic that doesn't directly map to a item -> bool predicate expression. In those cases, you can still encapsulate most of your code by using the compiler to generate the bulk of the Expression, and then just wrap it with a bit of hand-created expression. In my LINQ to the CRUD article, the code attached uses this approach to generate queries for the select/delete commands. Again, I will note that TomasP has written expansions so you can just write myCoolExpression.Expand(arg) rather than building everything by hand.

If you know of any other links or work done in this area, I'm very interested in seeing other approaches. Thanks!

Code
Wednesday, September 05, 2007 11:58:49 PM UTC  #    Comments [2]  |  Trackback

C# 3.0 and LINQ Misunderstandings

Apparently, there is some considerable confusion over all the new C# language features. People who I would hope are reasonably intelligent are completely misunderstanding some C# fundamentals. Agreed, a lot of the new concepts introduced to C# 3.0 are might seem relatively foreign to C# users. Microsoft's marketing related to LINQ doesn't help much either. I'm going to try to clarify the top few things I've seen. I'll reference the C# spec (http://msdn2.microsoft.com/en-us/library/ms364047(vs.80).aspx) so you can know I'm being accurate.

Myth: LINQ is just a data access technology

While data access is obviously a large part of LINQ (even the name stands for Language Integrated Query), you can do a lot more than just access data. Re-using a previous example:

    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());

There's no sign of data access there! The practical functional C# articles on my site go into more detail on this. Suffice to say, while a lot of new features were added to "make LINQ possible", much, much, more is possible than just creating queries. (Alternatively, you might chose to limit the meaning of LINQ, as I'd like to do. In this case, I wouldn't consider using lambdas, extensions, etc. as LINQ. Others may disagree. Marketing?)

Implicitly typed local variables (var keyword)

C# is still statically and strongly typed. But, there's a new feature that lets you declare local variables without specifying the type, if the type can be inferred from the initializing expression. From the spec:

var i = 5;
var s = "Hello";
var d = 1.0;
var numbers = new int[] {1, 2, 3};
var orders = new Dictionary<int,Order>();

The implicitly typed local variable declarations above are precisely equivalent to the following explicitly typed declarations:

int i = 5;
string s = "Hello";
double d = 1.0;
int[] numbers = new int[] {1, 2, 3};
Dictionary<int,Order> orders = new Dictionary<int,Order>();

Oddly enough, the C# spec doesn't even mention LINQ or anonymous types when it talks about "var" locals. Why is there confusion about this simple feature? Let's examine anonymous types:

C# anonymous types allow you to declare a type just by specifying its fields. From the spec: "C# 3.0 permits the new operator to be used with an anonymous object initializer to create an object of an anonymous type." For instance, the following code is a valid expression:

new { Name = "Michael" }

This produces an object of a new, anonymous, type, containing a single string property called Name. Hence, this code works:

Console.WriteLine(new { Name = "Michael" }.Name);

However, how can you assign such an object to a variable? Yes, that is the only "need" for the var keyword. There's no way to name the type, since it's anonymous. Regardless if you agree with anonymous types (versus a Tuple class), this is the place where you *need* to use var: assigning an anonymous type to a local.

As you may have noticed, I still haven't mentioned LINQ. Anonymous types are not LINQ specific. They are, however, particularly helpful for certain LINQ queries:

var topCustomers = MyDatabase.Customers.Where(c => c.GoldStar == true).Select(c => new {c.CustomerId, c.Name});

Because of this, people start to associate anonymous types only with LINQ queries and hence, var with LINQ only. The truth is that these features can be used anywhere.

What's the takeaway here? The var keyword simply allows the compiler to infer the type of the variable so you don't have to specify it. Nothing more, nothing less. Some people still want to explicitly annotate every single variable – hey, that's their choice. But don't be locked into this just because there was no option before. Me? I'll take more concise code any day!

As a side note (if this wasn't clear), the var keyword is NOT dynamic typing, just implicit typing (type inference).

Dynamic C#

Seems there's a lot of confusion about C# being dynamic. C# is not dynamically typed, as some seem to imply. I think perhaps some of the confusion comes from all the nice type inference that C# provides. Using the var keyword as shown above might make some people feel "oh, I'm saying var just like Javascript!". Adding to the confusion is the fact that C# is now a "semi"-functional language. For instance, from "why's (poignant) guide to ruby", we see this Ruby code:

5.times { print "Odelay!" }

C# allows us to write in a similar style:

5.Times(() => Console.WriteLine("Odelay!"));

The next Ruby example in that guide goes like this:

exit unless "restaurant".include? "aura"

In C#, we can write:

exit.Unless(() => "restaurant".Contains("aura"));

The Wikipedia article on Functional Programming lists a few features that dynamic languages usually have:

Eval

Sorta… you can create Expression<T> and execute them. On the other hand, you can't do anything like eval(myString) (which is just asking for runtime failure).

Higher-order functions

Definitely in there.

Runtime alteration of object or type system

No, not really. (I.e., maybe you can hack around with certain APIs to try to do some magic, but it's not a language feature.)

Functional Programming

Yes. But still, functions aren't really first class citizens…yet. Once we can start using method groups as Actions and Funcs, implicitly, then it'll get even better. This is an interesting presentation from Andrew Kennedy, Microsoft Research: C# is a functional programming language.

Closures

Yep, since anonymous methods were introduced in C# 2.0.

Continuations

Extremely limited, in the form of yield return.

Introspection

Not just reflection, but actually inspecting the actual code. C# 3.0 has this in the form of Expression<T> (see below).

Macros

No, not crappy C-style macros. Here, I'm thinking more like macros that'd let you create things like C# query comprehensions, *in source code*. (Which is what I was actually hoping when I saw the new query comprehension syntax… no such luck).

In summary, C# lets you gain a lot of benefits usually associated with dynamic programming, but without the nasty parts of dynamic typing.

A great paper on this subject is Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages, by Erik Meijer and Peter Drayton of Microsoft.

Myth: Extension methods add methods to a class

This is a tricky one, since extension methods appear to be exactly that. This myth is also somewhat perpetuated by the C# spec: "In effect, extension methods make it possible to extend existing types and constructed types with additional methods." But, right before that, the real explanation is given: "Extension methods are static methods that can be invoked using instance method syntax."

Essentially, Extension methods allow us to use infix notation with certain methods. This explains the line from the spec "in effect". A more helpful way to think of this is by thinking of the "." operator as an overloaded operator that also allows passing the first operand given to it as the first argument to specially marked methods.

An alternative* would be to define an operator like the F# pipeline operator (|>). In C#, this would let us write stuff like:

customers |> Seq.Where(c => c.Name == "Michael")

That doesn't look like an improvement. BUT, we no longer need to mark methods in a special way. We can just use them:

myArray |> Array.BinarySearch("s")

Why do we need infix notation anyways? Well, the normal prefix notation can be difficult to read:

Select(Where(customers, c => c.Cool == true), c => c.Name)
Array.BinarySearch(items, "S")

Extension methods just make those functions easier to pipeline. That's all folks. Think of them like this, and save yourself a headache about what "extending a type" means.

*My guess as to why extension methods are done they way they are is because it could confuse people if you have something like "item |> Stuff.SomeMethod("X")" where SomeMethod returns a function. Or, where you have "item |> Stuff.SomeMethod("X").SomethingElse("y"). I'm still annoyed that I can't use infix semantics where *I* want, but oh well.

Lambda expressions and Expression<T>

Spec: "Lambda expressions provide a more concise, functional syntax for writing anonymous methods.". Spec: "Expression trees permit lambda expressions to be represented as data structures instead of executable code. A lambda expression that is convertible to a delegate type D is also convertible to an expression tree of type System.Query.Expression<D>."

So, "lambda expressions" can either be just code (i.e., directly executable IL) OR they can get turned into a data structure, Expression<T>.

Adding to the confusion, a lambda expression can contain just an expression (i=> i + 1), or it can be a block of statements ( i => {Write(i); i++; Write(i); return i+1;} ). However, a lambda expression with a statement block body cannot become an Expression<T>. (As far as I know, VB's lambdas only allow for expression bodies, not blocks.)

An example:

Expression<Func<int, int>> inc = i => i + 1;

Is equivalent to:

ParameterExpression param_i = Expression.Parameter(typeof(int), "i");
var inc2 = Expression.Lambda<Func<int, int>>(
    Expression.Add(
        param_i, 
    Expression.Constant(1, typeof(int))),
param_i);

At runtime, you can then go inspect the actual code and decide what to do with it. This is exactly the premise for LINQ to SQL. When you create a LINQ-to-SQL query, it is turned into an expression like shown above. Then the LINQ-to-SQL APIs inspect and convert that expression tree into SQL statements.

Here, I can understand the confusion. The word "expression" is used in three distinct manners. Rightfully, Expression Trees should be referred to as Expression or Expression<T>, which could help clear up some of the confusion. Additionally, it doesn't help that lambdas have these different conversion rules (although working around it could be ugly, possibly).

Are there any other features that you've seen misused or you've had questions about? Let me know! I love comments, insults, and suggestions.

Want to correct me on something? Go right ahead! But, if you're going to say something like "C# doesn't have type inference", please make sure to either be an expert on the matter or be able to quote an authoritative source or show a proof. Thanks!

Code
Wednesday, September 05, 2007 3:01:18 AM UTC  #    Comments [10]  |  Trackback

 Wednesday, August 29, 2007
Practical Functional C# - Part IV – Think in [Result]Sets

Don't skip parts I to III to get you up to speed. I have $300 up for grabs if these articles are incorrect in stating that it will greatly improve most C# apps:

    Practical Functional C# - Part I 
    Practical Functional C# - Part II 
    Practical Functional C# - Part III - Loops are Evil

Tell the compiler what you want, rather than how to do it. That's a key concept that can take you very far. However, as imperative programmers, this can sometimes be a hard concept. We are so used to instructing the processor, step-by-step, how do to things, and then only after we're all done, making sure the end effect is to our liking. Functional programming helps us change this.

Consider the following task: Write a function that checks a username against a few disallowed characters "!@#$%^&*". The normal C# implementation looks like this (null checks removed):

static bool IsBadUserName(string userName)
{
    var badChars = "!@#$%^&*";
    foreach (char c in badChars) {
        if (userName.Contains(c)) return true
    }
    return false;
}

It's not horrible; there's only one branch, but it does require a bit of thought to sort out. Now let's take an approach where we deal with string as a set of characters to manipulate at once (requires C# 3.0 compiler):

static bool IsBadUserName2(string userName){ 
    var badChars = "!@#$%^&*";
    return userName
        .Intersect(badChars) 
        .Any();
}

We've reduced the function from a step-by-step loop into something that has only path. The bigger benefit is that there's no need to evaluate end conditions and so on for correctness: The code says exactly what it does. "Are there any bad characters in the user name?" We don't have to worry how it does what we asked, we just need to think in terms of what we want our result to be.

A common function in functional languages is called "map". Map takes a list of something and turns it into a list of something else. For instance, if we had a list of integers ("ourInts"), we could turn them into squares by saying "map ourInts by multiplying each value with itself". In C# (LINQ), they called map "Select". Here's a quick example:

var ourInts = new[] { 2, 5, 13 };
var squares = ourInts.Select(i => i * i);

Squares will contain the list { 4, 25, 169 }. What use is this? Well, it is an extremely common pattern to take some set of data, filter it, modify it a bit, and return a new set of data. Here's an example: You have a variable from containing semicolon-delimited email addresses. You want to turn these into an array of .NET's MailAddress objects to use with some other code. The loop isn't very pretty:

var tempAddresses = new List<MailAddress>();
foreach (string s in semicolonEmails.Split(';')) {
    var ts = s.Trim();
    if (ts == "") continue
    tempAddresses.Add(new MailAddress(ts)); 
}
var myAddresses = tempAddresses.ToArray();

But consider the functional approach:

var myAddresses = semicolonEmails
    .Split(';'
    .Select(s => s.Trim())
    .Where(s => s != ""
    .Select(s => new MailAddress(s))
    .ToArray();

In one statement, we transform the data three times, as well as add filter to remove empty items.

One place where LINQ falls flat on its face is when it comes to processing data. For some reason, there are no methods defined to do "ForEach" or "Process". (Even more interesting: List<T> does define these methods.) Process is a great pattern: on each item, it performs some action, then returns the original item. The code to define it is very simple and looks like this:

static IEnumerable<T> Process<T>(this IEnumerable<T> source, Action<T> f)
{
    foreach (var item in source) {
        f(item); 
        yield return item; 
    }
}

How is this of use? Well, let's chain together some functional and imperative processing. For instance, write a program that does some long process on all files passed in as arguments – on separate threads. If we take the purely imperative approach, our code looks like this:

static void Main(string[] args)
{
    var threads = new List<Thread>();
    foreach (var s in args) {
        Thread t = new Thread(startLongProcess); 
        t.Start(s); 
        threads.Add(t); 
    }
    foreach (var t in threads) {
        t.Join();
    }
}
static void startLongProcess(object data)
{
    SomeLongProcess((string)data);
}

Yes, we actually must declare a separate function just to invoke SomeLongProcess. Let's combine and use the functional approach now:

static void Main(string[] args)
{
    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());
}

Which way is going to be easier to edit and change around? I don't know about you, but for me, going from ~12 to ~5 lines, removing extra variables, useless functions and flow control structures: that's a hands-down win in my book.

As a side note, threading, in fact, is a space that is extremely ripe for functional styles. I'm willing to bet that ".NET 4" will include threading extensions that rely heavily on functional concepts. For instance, it's easy to create a method that allows us to replace the previous program with this:

args.Parallel(SomeLongProcess);

But I'll talk about that another day.

In the next article, I'm going to cover C#'s new inner functions capability and how that can be used to help build up more complicated function chains. I'd also like some feedback on which kinds of areas of C# programming you've run into that seem to require more code than necessary.

Code
Wednesday, August 29, 2007 10:45:59 AM UTC  #    Comments [2]  |  Trackback

 Sunday, August 26, 2007
A LINQ to the CRUD

A question many people have run into is: How does CRUD fit into LINQ-to-SQL? While LINQ-to-SQL (I'll abbreviate as DLINQ) provides a very fast and easy way for us to start querying our data, it doesn't handle updates as beautifully. It is particularly noticeable when you are doing multi-tier and hence cannot share a DataContext.

Let's consider an example database called "MembersDatabase" which has a table called "Accounts". Accounts has an int primary key, and a varchar Email field. We use the DLINQ designer and create the dbml that generates a class called MembersDataContext. How does our app-tier code look? I'll give you a hint, it starts with "ug" and rhymes with "nasty":

// Select
int someId = 123; // Passed from another tier
Account someAccount; // Can't use implicit typing -- no anonymous types
using (var dc = new MembersDataContext()) {
    someAccount = dc.Accounts.SingleOrDefault(a => a.AccountId == someId);
}

// Insert
var myAccount = new Account();
myAccount.Email = "me@contoso.com";
using (var dc = new MembersDataContext()) {
    dc.Accounts.Add(myAccount); 
    dc.SubmitChanges();
}

// Update
int myId = 1; // Id and email passed from another tier
string newEmail = "cool@cool.com";
var changedAccount = new Account();
changedAccount.AccountId = myId;
changedAccount.Email = newEmail;
using (var dc = new MembersDataContext()) {
    dc.Accounts.Attach(changedAccount, true); 
    dc.SubmitChanges();
}

// Delete
int idToKill = 2; // Passed from another tier
using (var dc = new MembersDataContext()) {
    using (var txScope = new System.Transactions.TransactionScope()) {
        var acc = dc.Accounts.SingleOrDefault(a => a.AccountId == idToKill); 
        if (acc == null) throw new ChangeConflictException("Row not found."); 
        dc.Accounts.Remove(acc); 
        dc.SubmitChanges();
        txScope.Complete();
    }
}

It's not horrible; it's certainly better than anything before it. But, we can do better. I created some helper classes to do so (see attached file).

First, DatabaseBase<TContext>. This holds our tables, and provides DataContext helper functions Use and Query. Second, TableBase<TItem, TKey>. This actually provides our CRUD methods. I don't think anyone is overly interested in the implementation details (comment if I'm wrong), so here's how you declare your CRUD types:

class MembersDatabase : DatabaseBase<MembersDataContext>
{
    public static readonly TableBase<Account, int> Accounts 
        = CreateTable(dc => dc.Accounts, a => a.AccountId);
}

That's it.

You create a new class to serve as your "database" class. All that's required here is to inherit from DatabaseBase.

Next, for each table, simply create a new field via CreateTable. The first parameter is a lambda function that selects the right table off the DataContext. The second parameter is a lambda expression that selects the primary key. Not much too it.

So, how does our previous chunk of code look with this small helper library?

// Select
int someId = 123; // Passed from another tier
var someAccount = MembersDatabase.Accounts.SelectByKey(someId);

// Insert
var myAccount = new Account();
myAccount.Email = "me@contoso.com";
MembersDatabase.Accounts.Insert(myAccount);

// Update
int myId = 1; // Id and email passed from another tier
string newEmail = "me@me.com";
var changedAccount = new Account();
changedAccount.AccountId = myId;
changedAccount.Email = newEmail;
MembersDatabase.Accounts.Update(changedAccount);

// Delete
int idToKill = 2; // Passed from another tier
MembersDatabase.Accounts.Delete(idToKill);

That's about 40% less code. It’s far more straightforward, being a single block.

To start using this code, just drop DatabaseBase.cs into your project. It adds DatabaseBase to System.Data.Linq. Then subclass as shown above, and you're on your way to LINQ updating bliss. What do you think?

DatabaseBase.cs (5.65 KB)

P.S. At any rate, I should get points for the Zelda pun, right?

Update: I forgot to mention, you'll want to turn UpdateCheck to Never on your columns in the LINQ-to-SQL designer.
Update: The code (including the Tuple class) for the .NET 3.5 RTM release is here: http://www.atrevido.net/blog/2008/06/26/LINQ+To+The+CRUD+RTM.aspx

Code
Sunday, August 26, 2007 8:30:32 PM UTC  #    Comments [9]  |  Trackback

 Wednesday, August 22, 2007
Reason #52 against Visual Basic (Nothing in Visual Basic)

What people do in their own time in the privacy of their homes is none of my business. However, when they mess with reading documentation, then it crosses the line and becomes annoying. How many times do VB developers need to be told that a null is "Nothing"? Consider this snippet from MSDN:

-------
The CreateUser method will return a null reference (Nothing in Visual Basic) if password is an empty string or a null reference (Nothing in Visual Basic), username is an empty string or a null reference (Nothing in Visual Basic) or contains a comma (,), passwordQuestion is not a null reference (Nothing in Visual Basic) and contains an empty string, or passwordAnswer is not a null reference (Nothing in Visual Basic) and contains an empty string.
-------

Five times in one paragraph! I know null type systems are annoying and lead to errors, but that seems a bit excessive. Seriously though, it'd make more sense to make VB developers learn a few words once, rather than having to mess up documentation just in case they get confused.

Code | Humour
Wednesday, August 22, 2007 2:49:34 PM UTC  #    Comments [5]  |  Trackback

 Thursday, August 16, 2007
Practical Functional C# - Part III – Loops are Evil

Be sure to read these first two articles:

    Practical Functional C# - Part I

    Practical Functional C# - Part II

OK let's start with a quick challenge. Write an accurate description of the following program, in English:

static void Main(string[] args)
{
    string output = "";
    if (args.Length > 0) output = args[0]; 
    if (args.Length > 1) {
        for (int i = 1; i < args.Length; i++) {
            output += ", " + args[i]; 
        }
    }
    Console.WriteLine(output);
}

Well? The specification might be something like "a program that writes its arguments separated by a comma and space". But how quickly could you determine that from the code? How much additional time does it take to determine there aren't any bugs? Every time a maintenance developer comes across this code, they have to analyze this code, determine the boundary conditions correctly, verify the indexing, and so on. Every time someone reads this code, she must pay a high tax. The saddest part is that this is an extremely common pattern.

Edit: As Chad Hower pointed out in the comments that you can remove the two if statements by performing a check inside the loop. That shortens it considerably and reduces some of the "tax" that has to be paid when you read it.

"Take this set of values and aggregate them into a single value". How many pieces of code do exactly this, but obscure it behind a for loop? Functional languages realize this and provide functions called "Fold" or "Reduce". Such functions take an accumulator function, apply it to every element in the list, then return the accumulator value when finished. C# 3.0, courtesy of LINQ, provides an equivalent function, called "Aggregate":

static void Main(string[] args)
{
    string output = args.DefaultIfEmpty("").Aggregate(
        (accum, item) => accum += ", " + item); 
    Console.WriteLine(output);
}

Here we are saying that we are going to aggregate args into a single string value. The lambda on the second line takes two arguments. The first is the accumulator and Aggregate passes it to each element ("threads" it through). The second parameter is the current item we are working with. The return value of our lambda is simply the concatenation of the current accumulated value, comma and space, and the current item. This return value becomes the accumulator for the next item. On the first execution, since we did not give it an explicit seed value, it just uses the first item. The final return value becomes the value that Aggregate returns to output. (Edited: Added DefaultIfEmpty -- this overload of Aggregate doesn't work on empty sequences.) Using the lambda provides nicer syntax than this equivalent code:

static void Main(string[] args)

    string output = args.DefaultIfEmpty("").Aggregate(joinCommaSpace); 
    Console.WriteLine(output);
}
static string joinCommaSpace(string a, string b)

    return a + ", " + b;
}

So why is this a good thing? Well, it goes back to the questions about the first set of code: how much effort is required to determine intent and correctness of a particular piece of code? In the imperative way, we need eight lines of code, with three distinct paths. Using a functional approach, we have three lines of code and only one code path. The only serious objection that I've heard is that this code is "unfamiliar". Well, sure, anything new might be unfamiliar, but that does not make it bad.

You wouldn't write your SQL code to make someone only familiar with C "comfortable" or "familiar", would you? You wouldn't write in C# the same way you'd write in C or BASIC. So why stick with outdated programming practices just because some "new" developer might get confused? Functional code is more concise, less error prone, and much more readable. Learning this style might take a couple of days, but it's an invaluable skill. At any rate, LINQ is built upon these concepts, so it will do people good to learn anyways.

Now, let's examine how to create our own functions to hide loops. This time, we're going to look at data access. A very common pattern in data access is creating a SqlDataReader, going through it, adding elements to a list. Like other patterns, overhead obscures the intent:

public static List<Person> GetAllPeople()
{
    // Setup command
    var comm = new SqlCommand("GetAllPeople");
    comm.CommandType = CommandType.StoredProcedure;

    // Setup connection
    using (var conn = new SqlConnection(Settings.ConnectionString)) {
        comm.Connection = conn; 
        conn.Open();

        // Loop and add people to our list
        using (var reader = comm.ExecuteReader()) {
            var people = new List<Person>();
            while (reader.Read()) {
                var p = new Person();
                p.Name = reader.GetString(0); 
                p.Age = reader.GetInt32(1); 
                people.Add(p); 
            }
            // Done
            return people; 
        }
    }
}

Yes, something as simple as reading a list of two-field type can take 14 lines of code. Let's do something about that. The only unique part is where we create a Person from the SqlDataReader. Outside of that, it's a very straightforward, but large, pattern. Refactoring the pattern, we get this:

public static List<T> ListFromReader<T>(string connectionString,
SqlCommand command, Func<SqlDataReader, T> code)
{
    // Setup connection
    using (var conn = new SqlConnection(connectionString)) {
        command.Connection = conn; 
        conn.Open();

        // Loop into list
        using (var reader = command.ExecuteReader()) {
            var list = new List<T>();
            while (reader.Read()) {
                // Here we call the supplied code to add the right item
                T item = code(reader); 
                list.Add(item); 
            }
            return list; 
        }
    }
}

Now our code to GetAllPeople is very simple:

public static List<Person> GetAllPeople2()
{
    var comm = new SqlCommand("GetAllPeople");
    comm.CommandType = CommandType.StoredProcedure;

    return ListFromReader(Settings.ConnectionString, comm, 
        reader => new Person {
            Name = reader.GetString(0), 
            Age = reader.GetInt32(1) 
        });
}

Just be looking at this code, we know all the data-API stuff is handled correctly. This approach is vastly superior to other common SQL "helpers", for example, an ExecuteReader method that gives us a SqlDataReader. First, we have no locals that need disposing or other cleanup: this function scopes the variables to our lambda. Second, we can focus on our actual logic (creating a Person) rather than dealing with loop conditions.

Edit: This is not specific to C# 3.0! You can definately achieve a lot of the same benefits in 2.0, except you need to replace the simple lambda syntax ( => ) with the much more verbose delegate (ArgType arg) { } (anonymous method) syntax. C# 3.0 just makes it much easier to write.

What other common loops do you encounter? I have a few more we'll cover in the next article. As always, comments, insults and suggestions are welcome.

Code
Thursday, August 16, 2007 4:36:01 PM UTC  #    Comments [19]  |  Trackback

 Monday, August 13, 2007
Practical Functional C# - Part II

Previous article: Practical Functional C# - Part I

Last time I demonstrated how to replicate the using keyword as a function that takes a function. In this article, I'm going to show some real-world cases where you'll see a major improvement by taking a more functional approach. I'm going to use WCF as an example space.

WCF allows us to define services as normal C# interfaces. We then use a factory to create a proxy for an interface, specifying the URI, binding types (HTTP, binary, message queue), and other options. However, our interest is how this actually looks from the client side; how we actually make calls.

Since a WCF call is actually invoking code on another machine, any number of bad things can happen to our client channel, resulting in exceptions. When this happens, we have to Abort the channel. But if things complete successfully, we just need to Close the channel. Of course, that's not easy enough: if the Close fails, we then need to Abort the channel anyways. Confused? Well, check out this code snippet that uses an interface called ICalculator to add two numbers:

void WcfExample()
{
    int a = 1; 
    int b = 2; 
    int sum; 
    var chanFactory = GetCachedFactory<ICalculator>();
    ICalculator calc = chanFactory.CreateChannel();
    bool error = true
    try {
        sum = calc.Add(a, b); 
        ((IClientChannel)calc).Close();
        error = false
    }
    finally {
        if (error) {
            ((IClientChannel)calc).Abort();
        }
    }
    Console.WriteLine(sum);
}

Ouch! Out of 17 lines of code, only five relate to our problem. The rest (70%) is pure, ugly overhead. This is a much more complicated pattern than the using pattern, and from experience, I can tell you it is error-prone. Fortunately, this pattern decomposes nicely. The only two unique things are the name of the interface and the statement that acts on the interface. Here is one way you might go about writing this generically:

TReturn UseService<TChannel, TReturn>(Func<TChannel, TReturn> code)
{
    var chanFactory = GetCachedFactory<TChannel>();
    TChannel channel = chanFactory.CreateChannel();
    bool error = true
    try {
        TReturn result = code(channel); 
        ((IClientChannel)channel).Close();
        error = false
        return result; 
    }
    finally {
        if (error) {
            ((IClientChannel)channel).Abort();
        }
    }
}

This is exactly like the previous code, but we've substituted TChannel and a function parameter named code instead of our actual types. The type of function we want is Func<TChannel, TReturn>. You can think of this as saying: "transforms a TChannel into a TReturn". Now, look at the beauty this allows on the client side:

void WcfExample2()
{
    int a = 1; 
    int b = 2; 
    int sum = UseService((ICalculator calc) => 
        calc.Add(a, b)); 
    Console.WriteLine(sum);
}

Presto! Our overhead went from 12 lines to zero. If that didn't sell you on the power of functions, I suggest finding another career. Now, there is a lambda there, but all it is saying is "Here is a function that takes one parameter of type ICalculator and returns the value of Add(a, b)." The C# compiler automatically infers the return type (int).

Some of you mentioned still using C# 2.0. Two things: First, you can use the 3.0 compiler and run on 2.0 because these examples don't reference any new assemblies. Second, you can achieve the same thing using anonymous methods, but they just look a bit uglier:

void WcfExample3()
{
    // C# 2.0 example with anonymous methods
    int a = 1; 
    int b = 2; 
    int sum = UseService<ICalculator, int>(delegate(ICalculator calc) {
        return calc.Add(a, b); 
    });
    Console.WriteLine(sum);
}

By this time, I hope you are thinking about places in your code where you can extract a larger pattern and use functions instead. The key concept to understand here is that your source code should reflect your solution. You should not have extra code sitting around just for the sake of appeasing the platform or meeting some "design pattern". Let the platform and libraries take care of the details, and let your code focus on actually solving problems. As always, I welcome your comments, suggestions, and insults, so please let me know what you think!

Next up, we'll take a look at loops, see why they are harmful, and then go about fixing them.

Code
Monday, August 13, 2007 11:07:26 PM UTC  #    Comments [8]  |  Trackback

 Sunday, August 12, 2007
Practical Functional C# - Part I

Edit: This first article might be a bit dense at times. However, I promise, if you stick through and read at least until Parts II and III (Loops are Evil) you will see major benefits that will totally transform your code (even C# 2.0 code!). In fact, to the first 3 people that can show me that these practices will NOT result in better code in many enterprise apps, I'll give $100 each. Post a blog comment or email me (mgg AT telefinity dot com).

Redundancy in source code is a common degeneration. But for the C# programmer, her weapons to eliminate it have been unwieldy at best. She has been able to eliminate simple, blocky, patterns, but truly writing reusable code at a fine level has not been easy.

This series will demonstrate how you can take advantage of functional programming (FP) in your work, today. The first task is to dispense with the notion that functional programming is difficult and strange. Most FP articles start with a recursive definition of the Fibonacci sequence and then talk about currying functions. Here, we're going to start off with something that every C# programmer has used many times. I can guarantee that after you absorb this series of articles, you'll be writing more concise, less buggy, more powerful software. I've seen many C# modules get cut down drastically in size. I've seen new C# code written that does multithreading and it was written correctly on the first go. I want you to see these things too.

Refactoring out common blocks of code is familiar. Probably every developer has written a helper method to initialize a database command or prepare a commonly used object. Unfortunately, refactoring only tends to happen to contiguous blocks. We do not see refactoring of complex patterns that do not fit into neat blocks. Consider the following visualizations of a program. The red blocks are the unique parts of the program, and the grey ones represent common statements.


                                             becomes

It is easy to refactor the first sequence: move the common parts into another function and call it. However, the second pattern poses quite a problem. The individual "common" blocks are too simple to move into another method. The call to the refactor method would be the same as the method itself! How can we refactor it correctly?

Well, C# itself contains keywords that refactor some of these patterns. Consider the using keyword. The following two methods are nearly equivalent:

void usingPatternExample()
{
    string result;
    StreamReader sr = new StreamReader(filename);
    try {
        result = sr.ReadToEnd();
    }
    finally {
        if (sr != null) sr.Dispose();
    }
}

void
usingKeywordExample()
{
    string result; 
    using (StreamReader sr = new StreamReader(filename)) {
        result = sr.ReadToEnd();
    }
}

I think it is obvious why everyone uses the using keyword over writing out this long init-try-something-finally-dispose pattern repeatedly. When we read code that uses the using keyword, the intent is instantly clear. We don't need to go validate the pattern to make sure it does what we think it does, we can just see "using" and