Logo




Subscribe:
RSS 2.0 | Atom 1.0
Categories:

Sign In


[Giagnocavo]Michael::Write()

# Wednesday, September 05, 2007
C# 3.0 and LINQ Misunderstandings

Apparently, there is some considerable confusion over all the new C# language features. People who I would hope are reasonably intelligent are completely misunderstanding some C# fundamentals. Agreed, a lot of the new concepts introduced to C# 3.0 are might seem relatively foreign to C# users. Microsoft's marketing related to LINQ doesn't help much either. I'm going to try to clarify the top few things I've seen. I'll reference the C# spec (http://msdn2.microsoft.com/en-us/library/ms364047(vs.80).aspx) so you can know I'm being accurate.

Myth: LINQ is just a data access technology

While data access is obviously a large part of LINQ (even the name stands for Language Integrated Query), you can do a lot more than just access data. Re-using a previous example:

    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());

There's no sign of data access there! The practical functional C# articles on my site go into more detail on this. Suffice to say, while a lot of new features were added to "make LINQ possible", much, much, more is possible than just creating queries. (Alternatively, you might chose to limit the meaning of LINQ, as I'd like to do. In this case, I wouldn't consider using lambdas, extensions, etc. as LINQ. Others may disagree. Marketing?)

Implicitly typed local variables (var keyword)

C# is still statically and strongly typed. But, there's a new feature that lets you declare local variables without specifying the type, if the type can be inferred from the initializing expression. From the spec:

var i = 5;
var s = "Hello";
var d = 1.0;
var numbers = new int[] {1, 2, 3};
var orders = new Dictionary<int,Order>();

The implicitly typed local variable declarations above are precisely equivalent to the following explicitly typed declarations:

int i = 5;
string s = "Hello";
double d = 1.0;
int[] numbers = new int[] {1, 2, 3};
Dictionary<int,Order> orders = new Dictionary<int,Order>();

Oddly enough, the C# spec doesn't even mention LINQ or anonymous types when it talks about "var" locals. Why is there confusion about this simple feature? Let's examine anonymous types:

C# anonymous types allow you to declare a type just by specifying its fields. From the spec: "C# 3.0 permits the new operator to be used with an anonymous object initializer to create an object of an anonymous type." For instance, the following code is a valid expression:

new { Name = "Michael" }

This produces an object of a new, anonymous, type, containing a single string property called Name. Hence, this code works:

Console.WriteLine(new { Name = "Michael" }.Name);

However, how can you assign such an object to a variable? Yes, that is the only "need" for the var keyword. There's no way to name the type, since it's anonymous. Regardless if you agree with anonymous types (versus a Tuple class), this is the place where you *need* to use var: assigning an anonymous type to a local.

As you may have noticed, I still haven't mentioned LINQ. Anonymous types are not LINQ specific. They are, however, particularly helpful for certain LINQ queries:

var topCustomers = MyDatabase.Customers.Where(c => c.GoldStar == true).Select(c => new {c.CustomerId, c.Name});

Because of this, people start to associate anonymous types only with LINQ queries and hence, var with LINQ only. The truth is that these features can be used anywhere.

What's the takeaway here? The var keyword simply allows the compiler to infer the type of the variable so you don't have to specify it. Nothing more, nothing less. Some people still want to explicitly annotate every single variable – hey, that's their choice. But don't be locked into this just because there was no option before. Me? I'll take more concise code any day!

As a side note (if this wasn't clear), the var keyword is NOT dynamic typing, just implicit typing (type inference).

Dynamic C#

Seems there's a lot of confusion about C# being dynamic. C# is not dynamically typed, as some seem to imply. I think perhaps some of the confusion comes from all the nice type inference that C# provides. Using the var keyword as shown above might make some people feel "oh, I'm saying var just like Javascript!". Adding to the confusion is the fact that C# is now a "semi"-functional language. For instance, from "why's (poignant) guide to ruby", we see this Ruby code:

5.times { print "Odelay!" }

C# allows us to write in a similar style:

5.Times(() => Console.WriteLine("Odelay!"));

The next Ruby example in that guide goes like this:

exit unless "restaurant".include? "aura"

In C#, we can write:

exit.Unless(() => "restaurant".Contains("aura"));

The Wikipedia article on Functional Programming lists a few features that dynamic languages usually have:

Eval

Sorta… you can create Expression<T> and execute them. On the other hand, you can't do anything like eval(myString) (which is just asking for runtime failure).

Higher-order functions

Definitely in there.

Runtime alteration of object or type system

No, not really. (I.e., maybe you can hack around with certain APIs to try to do some magic, but it's not a language feature.)

Functional Programming

Yes. But still, functions aren't really first class citizens…yet. Once we can start using method groups as Actions and Funcs, implicitly, then it'll get even better. This is an interesting presentation from Andrew Kennedy, Microsoft Research: C# is a functional programming language.

Closures

Yep, since anonymous methods were introduced in C# 2.0.

Continuations

Extremely limited, in the form of yield return.

Introspection

Not just reflection, but actually inspecting the actual code. C# 3.0 has this in the form of Expression<T> (see below).

Macros

No, not crappy C-style macros. Here, I'm thinking more like macros that'd let you create things like C# query comprehensions, *in source code*. (Which is what I was actually hoping when I saw the new query comprehension syntax… no such luck).

In summary, C# lets you gain a lot of benefits usually associated with dynamic programming, but without the nasty parts of dynamic typing.

A great paper on this subject is Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages, by Erik Meijer and Peter Drayton of Microsoft.

Myth: Extension methods add methods to a class

This is a tricky one, since extension methods appear to be exactly that. This myth is also somewhat perpetuated by the C# spec: "In effect, extension methods make it possible to extend existing types and constructed types with additional methods." But, right before that, the real explanation is given: "Extension methods are static methods that can be invoked using instance method syntax."

Essentially, Extension methods allow us to use infix notation with certain methods. This explains the line from the spec "in effect". A more helpful way to think of this is by thinking of the "." operator as an overloaded operator that also allows passing the first operand given to it as the first argument to specially marked methods.

An alternative* would be to define an operator like the F# pipeline operator (|>). In C#, this would let us write stuff like:

customers |> Seq.Where(c => c.Name == "Michael")

That doesn't look like an improvement. BUT, we no longer need to mark methods in a special way. We can just use them:

myArray |> Array.BinarySearch("s")

Why do we need infix notation anyways? Well, the normal prefix notation can be difficult to read:

Select(Where(customers, c => c.Cool == true), c => c.Name)
Array.BinarySearch(items, "S")

Extension methods just make those functions easier to pipeline. That's all folks. Think of them like this, and save yourself a headache about what "extending a type" means.

*My guess as to why extension methods are done they way they are is because it could confuse people if you have something like "item |> Stuff.SomeMethod("X")" where SomeMethod returns a function. Or, where you have "item |> Stuff.SomeMethod("X").SomethingElse("y"). I'm still annoyed that I can't use infix semantics where *I* want, but oh well.

Lambda expressions and Expression<T>

Spec: "Lambda expressions provide a more concise, functional syntax for writing anonymous methods.". Spec: "Expression trees permit lambda expressions to be represented as data structures instead of executable code. A lambda expression that is convertible to a delegate type D is also convertible to an expression tree of type System.Query.Expression<D>."

So, "lambda expressions" can either be just code (i.e., directly executable IL) OR they can get turned into a data structure, Expression<T>.

Adding to the confusion, a lambda expression can contain just an expression (i=> i + 1), or it can be a block of statements ( i => {Write(i); i++; Write(i); return i+1;} ). However, a lambda expression with a statement block body cannot become an Expression<T>. (As far as I know, VB's lambdas only allow for expression bodies, not blocks.)

An example:

Expression<Func<int, int>> inc = i => i + 1;

Is equivalent to:

ParameterExpression param_i = Expression.Parameter(typeof(int), "i");
var inc2 = Expression.Lambda<Func<int, int>>(
    Expression.Add(
        param_i, 
    Expression.Constant(1, typeof(int))),
param_i);

At runtime, you can then go inspect the actual code and decide what to do with it. This is exactly the premise for LINQ to SQL. When you create a LINQ-to-SQL query, it is turned into an expression like shown above. Then the LINQ-to-SQL APIs inspect and convert that expression tree into SQL statements.

Here, I can understand the confusion. The word "expression" is used in three distinct manners. Rightfully, Expression Trees should be referred to as Expression or Expression<T>, which could help clear up some of the confusion. Additionally, it doesn't help that lambdas have these different conversion rules (although working around it could be ugly, possibly).

Are there any other features that you've seen misused or you've had questions about? Let me know! I love comments, insults, and suggestions.

Want to correct me on something? Go right ahead! But, if you're going to say something like "C# doesn't have type inference", please make sure to either be an expert on the matter or be able to quote an authoritative source or show a proof. Thanks!

Code
Wednesday, September 05, 2007 3:01:18 AM UTC  #    Comments [10]  |  Trackback

Wednesday, September 05, 2007 4:55:05 AM UTC
Man, your blog is kicking massive amounts of ass lately. I just regret that I can't fully understand a few things you mention (you lost me on the pipeline thing, for example), but it's always exciting to see a new post from you.

By the way, this reminded me that I once tried reading about F# and it was just too different to follow, if you could help us mortals understand concepts from F#-like languages gently and with simple, useful examples, that would be great.

Keep up the good work!
Eric
Wednesday, September 05, 2007 5:25:31 AM UTC
Well thanks for the positive feedback. And as far as pipelining; that's EXACTLY what I want to hear. I probably shouldn't have dropped it in, but I just want to point out that "extension methods" are just a special case of "playing with functions" and aren't that special or "extensions" -- they can be achieved other ways; a more general solution.

Learning a functional language coming from imperative thinking can be really hard. I remember staring at some examples for a while thinking "what the heck is this and how does it work". Finally, when I "got it", my eyes went really wide as I realised how powerful the concept was.

I'll try real quick to explain the pipeline operator in F#. If you like this kinda stuff, I'll write some posts on it. Anyways, in F#, functions are just values. Imagine that every method in C# was automatically a delegate -- it's sort of like that. So, passing a function itself as an argument is just like passing any other value. So, we can define functions that operate on other functions ("higher order functions"). Hence, we can write something like:

let (|>) x f = f x

This just "flips" two values around. However, think of f as a function. Now, instead of writing:

printf "foo"

we can write

"foo" |> printf

The first parameter is "piped" into the function supplied.

The similarity to C# is that extension methods do the same thing:

Enumerable.ToList(myArray);

is equal to:

myArray.ToList();

which is theoretically equal to:

myArray |> Enumerable.ToList();

As you can see, all that's happening is that we're supplying the first argument *before* the method (hence infix notation). Regardless if we use . or |>, it's the same idea. The benefits of |> are that A: we can use *any* method and B: We can scope our methods. I imagine that once people start using extension methods, a lot of projects are going to have a myriad of extensions in them. Since there's no way to scope or qualify extension methods, I think this might get messy.
Wednesday, September 05, 2007 6:17:15 AM UTC
Hello Michael :-)
First of all, thanks for your fantastic blog entries!

About extension methods, and the F# pipeline operator... IMHO, the C# syntax has the advantage of being more concise, because it allows you to omit the name of the type where the method is defined.

I mean, with the dot operator you just use the type of the value on the left of the dot. If the compiler finds a suitable extension method, fine, otherwise you get an error.

In an hypothetical F#-like syntax, you'd have to write the type name explicitly, which clutters the code a bit (like "myArray |> Enumerable.ToList()" instead of "myArray.ToList()".

I agree, on the other hand, that the F# pipeline operator is "more general".

My 2c...

Ciao,
Massi
Wednesday, September 05, 2007 12:14:07 PM UTC
method = @conf['action']
if ! self.respond_to? method
@logger.error "Unknown method #{method}"
else
self.send method
end
Wednesday, September 05, 2007 2:15:47 PM UTC
Something that would be nice to see discussed is the new initializer syntax, which you touched on briefly with the anonymous type example, but can also be used for objects:

var a = new FooType (42) { FirstProperty = 1, SecondProperty = "adams" };

on lists:

var list = new List<int> { 1, 2, 3, 4, 5 };

and (apparently) on dictionaries:

var map = new Dictionary<string, string> {{"key1", "value1"}, {"key2", "value2"}};

(from http://blogs.msdn.com/wriju/archive/2007/04/13/c-3-0-enhancements-collection-initializers.aspx)

I'd really like to know more about dictionary initializers, and if/how they interact with anonymous types.

In particular, I've been doing lots of Perl recently, and initializing hashes is common:

my $xml_conversion = {
"/some/xpath/expression" => { dest => "/target/xpath/expression", other => 1 },
"/some/other/xpath" => { dest => "/target/other/expr", other => 2 },
};

This data-driven style of development is quite nice (something else that might be useful to cover; data-driven development is also mentioned in The Practice of Programming and Code Complete). Is such a thing easily mappable to C# 3 w/o lots of syntactic type overhead, e.g.

var xml_conversion = {
{ "/some/xpath/expression", new { Dest = "/target/1", Other = 1 } },
{ "/some/other/xpath", new { Dest = "/target/2", Other = 2 } },
};

i.e. have `xml_conversion' be a Dictionary<string, __AnonymousType__>, and have The Right Thing happen? For that matter, could this be used as a class member initializer?
Wednesday, September 05, 2007 2:31:25 PM UTC
@Massi: Yes you're correct that it's "more concise" in some cases to use . syntax rather than some theoretical pipeline operator. On the other hand, having to use prefix notation for Array.BinarySearch doesn't seem that great :\. I don't think anyone really has enough data to see how this will end up. But, I just have a feeling that there will be a myriad extension methods defined for all sorts of types, and by not being able to properly scope them, we lose out. Perhaps if we had to explicitly import the extensions we wanted... maybe? I also feel in general, a lot of work was done just for the "sake of LINQ" rather than thinking about things in general. Anonymous types, extension methods, usage of var, etc. -- it just feels like C# is so close to completely surpassing many other languages in all areas. My personal hope is that for C# 4, we'll see a lot of things "fleshed out" and that'll be the "near perfect" release :).

@malcontent: Did you have the right page?

@Jonathan: Data-driven is definately a big thing. When you think of functions as data, then it gets really interesting. As far as initializers, they're interesting but the only type you can initiate implicitly is array (via new[]). As far as I know, there's no syntax to declare a list or dictionary *without specifying the types*. The reason this sucks is that C# doesn't have type wildcard parameters. So, in the anonymous type dictionary example, we'd *want* to write something to the effect of:

var f = new Dictionary<?,?> { {"foo"}, new {Name = "cool", Bar = "Baz"} };

Type wildcards have a lot of use outside of this kind of declaration too...

Anyways, thanks for the suggestions, I'll write some more stuff about these features. And I think I'll do a post on what I'd love for C# 4 (or is it too early?). A lot depends on everyone who like this style of programming to make it clear to Microsoft that we want more of this. Microsoft has to deal with a lot of idiots, so making sure "people can't get confused" seems like it's high on the list. If more people let MS know that "we're adults and can handle ourselves" maybe they wont mind unleashing even more power ;).
Wednesday, September 05, 2007 2:52:27 PM UTC
"Perhaps if we had to explicitly import the extensions we wanted... maybe?"

We _do_, sort of. You have to have an explicit `using Namespace' declaration, and extension methods are searched for within all types within the specified namespaces.

So this isn't a wonderfully explicit "explicit import" -- you can limit only to the namespace level -- but at least there is _some_ "opt-in" behavior, instead of trying to grab every extension method that exists across every possible namespace/type...
Wednesday, September 05, 2007 3:00:48 PM UTC
As for desirable C# 4.0 features, I'd like to see two things:

- Generic Constraints requiring that the type parameter implement a specific (usually static) method;
- A decent functional macro system.

Generic Type-Method Constraints would allow using overloaded operators within generic methods:

T Add<T>(T a, T b) where T : .op_Addition(T,T) { return a + b; }

Or a generic conversion function:

class Convert {
public static T FromString<T>(string s) where T : .Parse(string) { return T.Parse(s); }
}

As for a macro system, I'd just like to be able to generate C# source code within C#, so that I can do something akin to C's "X Macros":

// table.h
MACRO(a, "a-desc")
MACRO(b, "b-desc")

// impl.c
enum values {
#define MACRO(a,b) PREFIX_ ## a,
#include "table.h"
#undef MACRO
};

const char*
get_description (values v)
{
static const char* descs[] = {
#define MACRO(a,b) b,
#include "table.h"
#undef MACRO
};
if (v >= 0 && v <= (sizeof(descs)/sizeof(descs[0]))) return descs[v];
return NULL;
}

i.e. better support for data-oriented programming. :-)
Saturday, November 24, 2007 2:52:22 PM UTC
For the examples you mention in C#

5.Times(() => Console.WriteLine("Odelay!"));

and

exit.Unless(() => "restaurant".Contains("aura"));

Are those using extension methods in C#? In Ruby are those out of the box?
Matt T
Saturday, November 24, 2007 3:10:20 PM UTC
Yes, in C# 3.0 those require adding your own extension methods. For minor things like this, I don't thin it's that much of a difference that it's not in the box. I'm not sure it's even smart to use those same ideas in C#; I was just illustrating what can be done.
OpenID
Please login with either your OpenID above, or your details below.
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):

Live Comment Preview