Logo




Subscribe:
RSS 2.0 | Atom 1.0
Categories:

Sign In


[Giagnocavo]Michael::Write()

# Saturday, August 16, 2008
Reference cells in F# sequences
I saw this stack implementation (http://blogs.msdn.com/jaredpar/archive/2008/08/15/immutablestack-in-f.aspx) and wanted to see how I'd approach it:
 
#light
open System

type 'a ImmutableStack =
   | Empty
   | Stack
of 'a * 'a ImmutableStack
  
member x.Peek() = match x with | Stack (v, _) -> Some v | _ -> None
  
member x.Pop() = match x with | Stack (_, s) -> s | _ -> failwith "empty stack"
   member x.Push a = Stack(a, x)
  
member x.IsEmpty = x = Empty
 
(Well, I'd probably just use a list in most cases.) But interestingly, he has an All function, which returns a sequence of the entire stack. Chris suggested using sequence expressions and then recursively calling the function:
 
member x.All =
    match data with
        | Empty
-> Seq.empty
        | Value (v,n)
->
            seq {
               
yield v
               
yield! n.All }
 
This is pretty nice as far as syntax goes. For some reason, I wanted to see how it'd look without the recursion (which needs to generate a new enumerator, unless the compiler is doing some wicked awesome stuff, which wouldn't surprise me). Here are some of the things I came up with:
 
member x.All1 = 
  
let until f = Seq.generate (fun () -> ref x) f (fun _ -> ())
   until (
fun cur -> match (!cur) with 
                        | Stack(v, s)
-> cur := s
                                         Some v 
                        | _
-> None)
 
Seq.generate continues calling the function until None is returned, so this provides the loop termination. Another similar approach:
 
member x.All2 =
      
let until f = Seq.generate (fun () -> ref x) f (fun _ -> ())
      until (
fun cur -> try (!cur).Peek()
                        
finally if not (!cur).IsEmpty then cur := (!cur).Pop())

Since Peek already returns 'a option, we can use it directly. The problem is that we need to update cur to cur.Pop
and then return a value. The try/finally works, but the whole deal still doesn't seem very elegant. Sequence expressions allow us to yield:
 
member x.All3 =
      
let cur = ref x
      {
while not (!cur).IsEmpty do
          yield (!cur).Peek()
          
do cur := (!cur).Pop() }
      |> Seq.choose(
fun x -> x)

I dislike this one. Because we're not doing the pattern match ourselves, we end up yielding 'a option. But None is never valid because we guard on IsEmpty. This makes us stick a Seq.choose on with an identity function to strip off the Some. [Side note, is there no built-in identity function?] Bringing the match into the seq fixes the issue, but the code is pretty long:

member x.All4 =
      
let cur = ref x
      {
while not (!cur).IsEmpty do
          match (!cur).Peek() with
                      | None
-> do ()
                      | Some v
-> do cur := (!cur).Pop()
                                  
yield v }

 I think what bothers me the most here is that the while and match are redundant. The None case will never be matches because we guard on IsEmpty. Moving that into a separate function gives:

member x.All5 =
      let v = function Stack(vl, _) -> vl | _ -> failwith "dont call on empty"
      let cur = ref x
      {
while (!cur) <> Empty do
          yield v (!cur)
          
do cur := (!cur).Pop() }

I prefer All5 to All4, but still think All1 or All2 are nicer. (Chris's original is best as far as I can tell.) How else can this be done?

Edit: I lost sight of one of the principals of functional programming: Composition. Here's a simple update to All5 that, IMHO, vastly improves it:

member x.All6 =
      
let cur = ref x
      
let toVal = function Stack(v, _) -> v | _ -> failwith "dont call on empty"
      let next () = try toVal (!cur) finally cur := (!cur).Pop()
      {
while (!cur) <> Empty do yield next () }
 

Every line builds up and you don't have to keep its details "active" in your mind. This makes the sequence expression (which is the driver of the algorithm) easy to verify. And again, to clarify, in a real system I'd probably use what Chris suggested since it's the nicest syntax. The other versions are only seeing what it looks like if we toss recursion and use a reference cell.

Code | FSharp
Saturday, August 16, 2008 4:09:49 AM UTC  #    Comments [0]  |  Trackback

# Friday, August 15, 2008
Typing string IDs

I just read these two posts:
http://blogs.msdn.com/simonince/archive/2008/08/15/strongly-typed-primitives.aspx
http://www.thejoyofcode.com/Avoiding_Primitive_Obsession_to_tip_developers_into_the_pit_of_success.aspx

And that reminded me about something we recently did. One system we're working on uses a lot of string identifiers for many different types of objects. There are many, many of these stored and passed around, so keeping things efficient was of high concern. 

The downside of string IDs (really, using any common type as an ID) is that it's legal to pass any primitive of the same type. Strings and integers abound, both as IDs of other classes, as well as general use. So it's not unimaginable that someone could pass the wrong parameter some where. This could lead to runtime crashes or unexpected results (if the ID is actually a real record of another class of object). Finally, using common types for IDs reduces usability. The signature "public void Delete(int id)" leaves a lot to be desired.

We wanted to hit all these issues, in addition to keeping things simple. There are times when untyped data needs to be converted, and this should be easy and clear. We wanted to avoid having to define new types when we had new classes of objects to identify It is also customer-visible code, so C# is used.

Using a reference type was unacceptable, because it'd add at least 12 bytes overhead (I think more on x64). Using a struct fixes this, in addition to dealing with silly nullability issues. [If a type can be null, it should always be explicit. C#'s "references types can be null" makes this hard.]

The end result was quite simple. Wrap a string in a structure so equality and hashing pass through. But, take advantage and remove case/cultural sensitivity (since in many systems, data IDs are not case sensitive). Provide explicit conversions so you can easily convert to and from strings, but never by accident. (If the conversions were implicit, you're back in the starting point.) Finally, add a generic parameter that is never used. The generic parameter gives you distinct types without having to define them. Now the APIs can look like:

    public void Delete(Id<Product> id)...

    Dictionary<Id<Group>, List<Id<User>>> members...

When you do have hardcoded IDs, as the blog entries I mentioned do, you can convert easily: (Id<User>)"Admin". Nulls are treated as empty, all the time (empty may be a valid value anyways).

When a truly optional ID is needed, use nullable types: "Id<Whatever>?". This fully captures how values are handled. This is vastly better than "It's a reference type, so maybe null is allowed. Or maybe null will crash. Empty string might be considered null, or maybe empty string means optional." With explicit nullability, the type system says it all.

The best part is that there should be pretty much no overhead. I'd expect the equality functions to be inlined, and there's no memory overhead, since the struct is simply a string reference.


Here's the class.:
public struct Id<T> : IEquatable<Id<T>>
{
    public Id(string name) {
        this.name = name ?? "";
    }
    readonly string name;
    
    public static explicit operator string(Id<T> x) { return x.name ?? ""; }
    public static explicit operator Id<T>(string s) { return new Id<T>(s); }

    public override bool Equals(object obj) {
        return
            !(obj is Id<T>) ? false :
            ((Id<T>)obj) == this;
    }

    public bool Equals(Id<T> other) {
        return other == this;
    }

    public override int GetHashCode() {
        return (name ?? "").GetHashCode();
    }
    public override string ToString() {
        return name ?? "";
    }
    public static bool operator ==(Id<T> a, Id<T> b) {
        return StringComparer.InvariantCultureIgnoreCase.Compare(a.name, b.name) == 0;
    }
    public static bool operator !=(Id<T> a, Id<T> b) {
        return !(a == b);
    }
}
I'd be interested in seeing a more generic, yet very simple, solution: one that doesn't rely on the underlying type to be string, but still provides all the same functionality. I don't think it's possible, since there's no way to get a generic constraint that'd allow similar handling of "string" and "int?". Additionally, structs can't inherit, so you'd end up using "Id<string, Product>" everywhere, which is far from elegant.
Code
Friday, August 15, 2008 7:39:11 PM UTC  #    Comments [8]  |  Trackback

# Thursday, June 26, 2008
I underestimated the power of query comprehensions

In my last post, I said I'd write the sample in C# to compare to F#. Well, I grossly underestimated the power of query comprehensions. The C# version is almost the same length (formatting differences, mainly). I'm surprised and impressed. (Or maybe I'm writing F# like I'd write C#.) Edit: I think maybe sequence expressions could cut it down a bit...

...But... C# still can't do discriminated unions efficiently or effectively ;).


    1 // crudcreatecompare.cs: Generates LINQ CRUD table fields using the horribly named DatabaseBase code

    2 //

    3 // Tables look like: [Table(Name="dbo.Accounts")]

    4 // Columns look like this:

    5 //  [Column(Storage="_AccountName", DbType="VarChar(128) NOT NULL", CanBeNull=false, IsPrimaryKey=true)]

    6 //  [DataMember(Order=1)] // Exists if serialization is turned on; used to order key parameters

    7 // Emits:

    8 //  public static readonly TableHelper<Account, String> Accounts =

    9 //      CreateTable(dc => dc.Accounts, a => a.AccountName);

   10 

   11 using System;

   12 using System.Collections.Generic;

   13 using System.Data.Linq.Mapping;

   14 using System.IO;

   15 using System.Linq;

   16 using System.Reflection;

   17 using System.Runtime.Serialization;

   18 

   19 class Program

   20 {

   21     static void Main()

   22     {

   23         Console.WriteLine(

   24             new Program().generate(

   25             "C:\\yourlinq.dll"));

   26     }

   27 

   28     string generate(string asmpath)

   29     {

   30         var asm = Assembly.LoadFrom(asmpath);

   31         var lines = from t in asm.GetExportedTypes()

   32                     let ta = getAttr<TableAttribute>(t)

   33                     where ta != null

   34                     let name = pluralize(ta.Name.Replace("dbo.", ""))

   35                     orderby name

   36                     select genTable(t, name);

   37         return joinStrings("", lines);

   38     }

   39 

   40     string genTable(Type t, string tableName)

   41     {

   42         var keyProps =

   43             from p in t.GetProperties()

   44             let c = getAttr<ColumnAttribute>(p)

   45             where c != null && c.IsPrimaryKey

   46             let dm = getAttr<DataMemberAttribute>(p)

   47             orderby dm == null ? 0 : dm.Order

   48             select p;

   49         var tw = new StringWriter();

   50         tw.WriteLine("public static readonly TableHelper<{0}, {1}> {2} =",

   51             t.Name,

   52             joinStrings(", ", keyProps.Select(p => p.PropertyType.Name)),

   53             tableName);

   54         tw.WriteLine("\tCreateTable(dc => dc.{0}, {1});",

   55             tableName,

   56             joinStrings(", ", keyProps.Select(p => "a => a." + p.Name)));

   57         tw.WriteLine();

   58         return tw.ToString();

   59     }

   60 

   61     string joinStrings(string sep, IEnumerable<string> items)

   62     {

   63         return string.Join(sep, items.ToArray());

   64     }

   65     string pluralize(string s)

   66     {

   67         return s.EndsWith("s") ? s : s + "s";

   68     }

   69 

   70     T getAttr<T>(ICustomAttributeProvider icap)

   71     {

   72         var a = icap.GetCustomAttributes(typeof(T), true);

   73         return a.Length == 0 ? default(T) : (T)a[0];

   74     }

   75 }

Code | FSharp
Thursday, June 26, 2008 6:15:10 AM UTC  #    Comments [0]  |  Trackback

LINQ to the CRUD CreateTable generator in F#

With the DatabaseBase and TableHelper classes, you still have to generate a CreateTable field per table. Why do it by hand? I wrote an F# script to generate the statements. I must say, I'm loving F# more than I had hoped. The more I learn, the better it gets. I've noticed (even in C#) that using a functional style generally means less errors. This script worked without bugs the first time (i.e., as soon as it compiled), which is pretty cool (granted, it's not big, but I'm sure if I had tons of for loops, I woulda messed up somewhere). Maybe this weekend I'll try writing it in C# just to compare (pretty sure it'll be more than 59 lines!). And I'll preempt comments about readability: Yes, it may be difficult to read if you don't know F#, but more on that later...

I'd love feedback as to making it more "functional"; years of imperative programming don't die quickly. Also, I'm not very happy with the definition of chooseAttr, but I can't seem to get it to infer the type I want otherwise.

Anyways, here's the code. You'll need to specify the references when compiling: -r "C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5\System.Data.Linq.dll" -r "C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.0\System.Runtime.Serialization.dll"


    1 // crudcreatetable.fsx: Generates LINQ CRUD table fields using the horribly named DatabaseBase code

    2 //

    3 // Tables look like: [Table(Name="dbo.Accounts")]

    4 // Columns look like this:

    5 //  [Column(Storage="_AccountName", DbType="VarChar(128) NOT NULL", CanBeNull=false, IsPrimaryKey=true)]

    6 //  [DataMember(Order=1)] // Exists if serialization is turned on; used to order key parameters

    7 // Emits:

    8 //  public static readonly TableHelper<Account, String> Accounts =

    9 //      CreateTable(dc => dc.Accounts, a => a.AccountName);

   10 

   11 #light

   12 open System

   13 open System.Reflection

   14 open System.Data.Linq.Mapping

   15 open System.Runtime.Serialization

   16 

   17 let getAttr<'target, 'a when 'a :> ICustomAttributeProvider> (ty : 'a) =

   18     match List.of_array (ty.GetCustomAttributes(typeof<'target>, true)) with

   19         | a::_ -> Some (a :?> 'target)

   20         | [] -> None

   21 let chooseAttr<'target, 'a when 'a :> ICustomAttributeProvider> (ty : 'a) =

   22     match getAttr<'target,_> ty with

   23         | Some(a) -> Some(ty,a)

   24         | None -> None

   25 

   26 let joinStrings (sep:string) items = items |> Seq.fold1 (fun acc x -> acc + sep + x)

   27 let pluralize (name:string) = if name.EndsWith("s") then name else name + "s"

   28 

   29 let generate(asmpath:string) =

   30     let genTable (t:Type, tableName) =

   31         let keyProps =

   32             t.GetProperties()

   33             |> Seq.choose (chooseAttr<ColumnAttribute,_>)

   34             |> Seq.filter(fun (p,c) -> c.IsPrimaryKey)

   35             |> Seq.map(fun (p,_) -> p, getAttr<DataMemberAttribute, _> p)

   36             |> Seq.orderBy(function | _,Some(dm) -> dm.Order | _ -> 0)

   37             |> Seq.map (fun (p,_) -> p)

   38         let tw = new IO.StringWriter()

   39         let pn fmt = Printf.twprintfn tw fmt

   40         pn "public static readonly TableHelper<%s, %s> %s ="

   41             t.Name

   42             (joinStrings ", " (keyProps |> Seq.map (fun p -> p.PropertyType.Name)))

   43             tableName

   44         pn "\tCreateTable(dc => dc.%s, %s);"

   45             tableName

   46             (joinStrings ", " (keyProps |> Seq.map (fun p -> "a => a." + p.Name)))

   47         pn ""

   48         tw.ToString()

   49 

   50     let asm = Assembly.LoadFrom asmpath // Don't use ReflectionOnly 'cause it won't resolve dependencies

   51     asm.GetExportedTypes ()

   52         |> Seq.choose (chooseAttr<TableAttribute,_>)

   53         |> Seq.map (fun (t,ta) -> t, ta.Name.Replace("dbo.", "") |> pluralize)

   54         |> Seq.orderBy (fun (t,_) -> t.Name)

   55         |> Seq.map genTable

   56         |> Seq.fold1 (+)

   57 

   58 generate "C:\\yourlinq.dll"

   59     |> Console.WriteLine

FSharp
Thursday, June 26, 2008 5:13:44 AM UTC  #    Comments [0]  |  Trackback

LINQ to the CRUD RTM

[Reposting because it appears to have been deleted somehow.]

A bit ago, I posted some info on doing CRUD operations using LINQ: http://www.atrevido.net/blog/2007/08/26/A+LINQ+To+The+CRUD.aspx

This is a lightweight way (no codegen at all, only 2 lines of code per table) to get some CRUD stuff with LINQ. It's not the most efficient or fantastic way of doing things, but it works fine in the several projects we've used it so far. And we get to use C# 3's expression trees, which is a fantastic and under-exploited feature. At any rate, it does show that doing disconnected work with LINQ is trivial.

The code I posted was for Beta 2 and no longer works. I've since added a few new features, but the basic idea remains the same as before. I'm just posting the new code as I've gotten a few emails and one comment about the old code no longer working.

DatabaseBase.cs (10.47 KB)

While working on it in a real project, I started using our Tuple struct, so you'll need that too:
Tuple.cs (2.8 KB)

As Scott Peterson pointed out, this class doesn't implement IEquatable<T>. Just add it and call the == operator.

As always I welcome any criticism.

Misc. Technology
Thursday, June 26, 2008 4:56:24 AM UTC  #    Comments [0]  |  Trackback

# Monday, March 31, 2008
The reason VB.NET is truly a second class citizen

No support for anonymous methods/lambda statements. I first realised this when someone commented on one of my functional C# postings, asking "how can I write this in VB"? Turns out VB has no support for anonymous methods. Ouch.

This lack of capability rules out a so much functionality, I honestly cannot imagine writing anything significant today using VB.NET. But with LINQ and F# making such a big splashes, I have high hopes for the future of functional programming on the .NET languages, and I'm sure VB.NET will get itself fixed up.

Really, dropping anonymous methods in favour of stuff like XML literals? Maybe that's why it's listed as first time or casual programming ;). (No, I jest. Scheme is far better for first time programming.)

Code
Monday, March 31, 2008 9:55:56 PM UTC  #    Comments [8]  |  Trackback

Empty WCF services references - no code generated

Usually I use WCF just by referencing the contracts directly and using my own generic helper methods. Unfortunately, I had to make use of the built-in async patterns that the codegen can provide. Hence, I ended up using the "Add Service Reference" dialog, which notices (unlike svcutil on the command line) that since I'm requesting async, it'll need to generate my interfaces even though they're already referenced. Maybe in .NET 4 we'll have a new async pattern that'll leverage Expression Trees to rid us of having to use the rather ugly FooAsync/BeingFoo patterns.

Things were working just fine until I added a reference to a new service. Adding the service reference didn't work; it failed silently. It added all the schema files to the project, but the Reference.cs file was empty. If I told it to NOT used referenced assemblies, then it generated everything just fine.

After switching to the command line and using svcutil, I got these errors:
Error: Cannot import wsdl:portType
Detail: An exception was thrown while running a WSDL import extension: System.ServiceModel.Description.DataContractSerializerMessageContractImporter
Error: Referenced type 'MyApp.Management.CoolFunction.CoolThingRule, MyApp.Management.Contracts, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' with data contract name 'CoolThingRule' in namespace 'http://schemas.MyApp.com/management/v2/00' cannot be used since it does not match imported DataContract. Need to exclude this type from referenced types.XPath to Error Source: //wsdl:definitions[@targetNamespace='http://schemas.MyApp.com/management/v2/00']/wsdl:portType[@name='ICoolThingManagement']

I regenerated without the referenced code and figured out that the problem was that one type referenced an enum which had some values without the [EnumMember] attribute (although it had a DataContractAttribute on the enum itself). Because of this, WCF wanted to represent the enum with a string and thus was incompatible. Adding the EnumMemberAttribute to all the values sorted out this issue.

Code
Monday, March 31, 2008 9:06:51 AM UTC  #    Comments [5]  |  Trackback

# Thursday, March 27, 2008
Hacking SOAP Faults into Silverlight 2 Beta 1

Silverlight 2 is pretty nice. Compared to dealing with the nightmare of HTML, CSS, AJAX and whatever else, it's quite divine. Of particular interest is that Silverlight 2 has a mini Windows Communication Foundation stack that can do basic SOAP work (and some "Web 2.0" type things too I think). While the SL WCF stack works pretty well overall, it doesn't support SOAP faults.

First off, SOAP faults usually cause the server to return an HTTP 500. For whatever reasons, this isn't handled correctly between the browser and Silverlight, and results in an UnexpectedHttpResponseCode. OK, so go into the Global.asax and in Application_EndRequest change 500s to 200s. [Yes, this requires that you run WCF with AspNetCompatibility.]

Some progress. Silverlight now gives a more useful exception:

Error: System.Runtime.Serialization.SerializationException: OperationFormatter encountered an invalid Message body. Expected to find node type 'Element' with name 'MyFunctionResponse' and namespace 'http://schemas.contoso.com/coolservice/v2/00'. Found node type 'Element' with name 's:Fault' and namespace 'http://schemas.xmlsoap.org/soap/envelope/'

Hmm, it's getting the fault, but can't seem to handle it. I tried adding typed faults to my WCF contract, but the SL WCF stack doesn't have the FaultContractAttribute. It appears as if SL cannot read faults at all.

Enter HackFaults. That message provides two pieces of data from the SOAP message, the element name, and the namespace. We can hijack these to pass our fault information, then extract it from the exception message. H to the A to the C-K-Y.

First off, we need to get access to our WCF fault. We can do this with an System.ServiceModel.Dispatcher.IErrorHandler. I've attached the full code, but basically you add an attribute to your WCF service to wire up the error handler. If you're doing WCF work, you probably want this anyways for logging purposes and so on. I've attached the entire error handler code to this article. The real meat of the error handler is this little function:

public void ProvideFault(Exception error, MessageVersion version, ref Message fault)
{
    // HackFaults store the exception in the httpcontext
    var context = System.Web.HttpContext.Current;
    if (context != null) {
        // Only works in AspNetCompat mode
        if (!context.Request.Browser.Crawler && context.Request.Browser.EcmaScriptVersion.Major > 0) {
            // This should rule out non-browsers
            context.Items.Add("HackFault", error);
        }
    }
}

I'm not sure if there's a better way to differentiate between browsers and real SOAP stacks. If there is let me know. Now, in our Global.asax, we want to pick this information up and replace the actual SOAP fault with our own hackfault:

protected void Application_EndRequest(object sender, EventArgs e)
{
    // In all fairness, I was drinking eiswein at the time
    if (Context.Items.Contains("HackFault")) {
        Context.Response.ContentType = "text/xml";
        Context.Response.StatusCode = 200;
        Context.Response.ClearContent();

        var hackEx = Context.Items["HackFault"] as Exception;
        var hackFaultXml = string.Format(
            "<s:Envelope xmlns:s=\"http://schemas.xmlsoap.org/soap/envelope/\"><s:Body><{0} xmlns=\"{1}\" /></s:Body></s:Envelope>",
            hackEx.GetType().ToString(), Context.Server.HtmlEncode(hackEx.Message));
        Context.Response.Output.Write(hackFaultXml);
    }
}

Now we have a message that has the bits of data we care about in the right places for the Silverlight exception to be ready for parsing. The SL-side parsing code is vile but easy:

public static string ExtractHackFaultMessage(this Exception ex)
{
    // HackFaults generate these Exceptions and messages:
    //Without debug:
    //Could not connect to server: System.Runtime.Serialization.SerializationException: [SFxInvalidMessageBody]
    //Arguments:FaultException,Something silly
    //With debug:
    //Could not connect to server: System.Runtime.Serialization.SerializationException: 
OperationFormatter encountered an invalid Message body. Expected to find node type 'Element' with
name 'SomeOperationResponse' and namespace 'http://schemas.cool.com/yea/v2/00'.
Found node type 'Element' with name 'FaultException' and namespace 'Something silly'
if (!(ex is System.Runtime.Serialization.SerializationException)) return null; var msg = ex.Message; string regexp; if (msg.Contains("[SFxInvalidMessageBody]")) { regexp = @"Arguments:(.*),(.*)"; } else if (msg.Contains("OperationFormatter encountered an invalid Message body")) { regexp = @"Found node type 'Element' with name '(.*)' and namespace '(.*)'"; } else { // Nope, it's some thing else return null; } var match = System.Text.RegularExpressions.Regex.Match(msg, regexp); if (match.Groups.Count != 3) return null; // Not expected var exType = match.Groups[1].Value.Trim(); var exMsg = match.Groups[2].Value.Trim(); if (exType == "System.ServiceModel.FaultException") return exMsg; else return exType+ ": " + exMsg; }

This little function will give me the fault information (and not mention FaultException if that's what it is) or null if it can't extract it. When dealing with errors, I can show an error like this: "Error: " + (ex.ExtractHackFaultMessage() ?? ex.ToString())

This gives me at least rudimentary error reporting from the server back to Silverlight. If this can be improved or fixed up, please tell me. (Typed fault exceptions would be nice, but then SL can't codegen the contracts.) At any rate, it works as a cheap hack until Silverlight Beta 2 (they gotta fix it by then, right? Right?).

HttpErrorHandler.cs.txt (2.49 KB)
Code
Thursday, March 27, 2008 7:19:36 AM UTC  #    Comments [1]  |  Trackback

# Wednesday, March 12, 2008
Medical and General Security

Nothing surprising

I've been waiting for this: http://www.nytimes.com/2008/03/12/business/12heart-web.html?_r=2&ref=technology&oref=slogin&oref=slogin

Certain pacemakers (Medtronic in this case) are easy to reprogram without any useful authentication. The result is that an attacker can kill someone remotely by modifying their pacemaker.

This certainly will not be the first time this happens. The response from Medtronic is idiotic:

"To our knowledge there has not been a single reported incident of such an event in more than 30 years of device telemetry use, which includes millions of implants worldwide"

It's funny seeing industries that typically have little to no security requirements in their products get rudely awakened. Another vendor, St. Jude, says something equally scary:

"used “proprietary techniques” to protect the security of its implants and had not heard of any unauthorized or illegal manipulation of them"

Who wants to bet there's some globally shared key at work? At any rate, we expected this kind of stuff because too many people can't think clearly about security (I'll be writing about [the lack of] VoIP security soon).

A growing problem

How should these devices secured in the first place? I'm not talking specifically about pacemakers, but all sorts of implants and enhancements that we will have during the next years, using security technology today.

First, they need to be remotely monitored. This is relatively easy to secure, as the risk is considerably less: information disclosure. For example, if each monitoring device had to have it's public key explicitly trusted for a particular patient, that'd be pretty easy. In the case that a key was disclosed (say, by capturing and attacking a monitoring device), the only access gained is read-only.

Making it even less risk, it's possible that the amount of effort required for such an attack exceeds the value of the information gained. For example, if an attacker can access a target's house, they could steal identification and request medical records be sent to them.

More importantly is editing of configuration. How do we determine who has access? In theory, we want any qualified medical professional to be able to change configuration in case of an emergency. Without a global network connected to the device, the device has no way to validate credentials, particularly revocation. Additionally, even assuming that every device has access to a global database, there would be too many authorised users to ensure security. (Just like large government databases.)

Is this a threat? Some people may think this is a far fetched idea. Certainly today this is not a widespread fear. It may be a neat way to carry off a attack against a single target, but I doubt it'd be effective for major attacks. But how long will it be until a large percentage of the population carries some kind of embedded device? Pacemakers, medicine delivery systems, vision implants, hearing, digestive -- the list goes on.

The bottom line is that humans will carry more embedded technology, and this technology must be secure *and* accessible. A system where losing your private key means surgery is not usuable.

The easy solution

As far as I can tell, the only solid way to ensure security with today's technology is to add a hard link. In order for anyone to modify configuration, the configuration device must establish itself over a physical connection. This ensures no remote attacks are possible. This would take away little to no convenience -- before editing yourself, you'd have to let them physically connect a device to you.

The same could be done for remote devices. Let's say your doctor wants to adjust your body remotely. You'd simply key the remote device[s] to your doctor, and key yourself to the remote device[s]. You've established a chain of trust that's easy to clear and recreate later. There is no global database, simply yourself and devices you touch.

This mimics what you have in the real world: You trust your doctor after you establish a relationship with him. You can then call him on the phone and you trust his advice to take more or less of the medicine.

A quick note on the details: The medical devices themselves don't need hard lines to the hard configuration interface. Indeed, your "hard link" could be a special device keyed to yourself. However, embedding this device into the body means you won't lose it and it'll be readily accessible to medical teams, even if you're unconscious.

To protect against damage to the hard link device, I suppose a backup key could be made authorized. You could then store it safe, by yourself (as in, with a bank's security deposit box, not the database of the device manufacturer).

The general solution

However, this only secures us as much as we can trust the authentication. But it still relies on manual revocation and trust editing. It may be acceptable to Verisign when they accidentally issue a certificate in Microsoft's name to an attacker, but it is not acceptable for humans. Specifically, in a short vulnerability window, you could die.

The real solution, and one that we're going to need eventually across all technology, is intelligence. Specifically, a machine intelligence that determines if what is happening or what is requested is dangerous. This is the only way that we will have security moving forward.

This kind of intelligence is what we use to protect ourselves now. If the water comes out glowing green, we decide we won't trust it, even though we do trust (in general) our public water system. If you see your doctor and he recommends moving from 5mg to 500mg of Xanax a day, you'll immediately revoke his trust.

Attacks will adopt this kind of intelligence. A hacker uses a vulnerability to gain access and then attack other systems from there. How long will it be until attacking programs themselves replace the work done by the hacker?

Our software and machines will have to adopt this kind of intelligence to thwart such attacks. It will no longer be "oh, sorry you got hit by malicious code from clicking on a hyperlink, please reinstall your OS". As long as humans can be killed by the devices in use, the stakes are too high for even tiny vulnerability windows.

 

Security
Wednesday, March 12, 2008 11:56:44 PM UTC  #    Comments [0]  |  Trackback

# Thursday, February 28, 2008
Writing events with System.Diagnostics.Eventing

... or, how the hell to use Vista and 2008's new ETW stuff with managed code. And, introducing ecmanaged: A decent way to do all this stuff.

Quick ETW Overview

Actually, the real ETW overview is here: http://msdn.microsoft.com/msdnmag/issues/07/04/ETW/ <-- This is some of the best overview and documentation on it (the other good stuff is the ecmangen documentation in the Windows SDK bin folder). The MSDN stuff is terribly confusing for the most part. Or maybe I'm too spolied by how easy it is to find stuff in the BCL. My overview is on what you gotta do to make things work in .NET.

ETW is a real pain to use with .NET. Even so, ETW starts off looking really promising. You define everything in a nice XML manifest file, and everything is based off that. But wait, everything? Shouldn't the manifest be the end-all? Yea, that'd make logical sense. No, you run some tools from the Windows SDK. First you run MC, which generates a .h header file. Managed devs are growning now -- why the hell should something as general as event tracing be language specific? The .h file contains the processed event descriptors, ready for C consumption. 

It worsens: MC also generated a resource script. You have to compile that with RC and it'll create a Win32 .res resource. Then you compile that into a binary (the C# compiler has the /win32res option). Then you go back and edit your XML manifest and make sure it points to the final binary. Wait, what? Yes. The resources that MC generates for RC contain all the messages that are in your XML manifest. Someone thought it was a really cute idea to go and make the Event Viewer not only read all the data from your manifest, but also have to go look it up from some binary resources.

Actually, this probably made sense to someone on the Windows team since I'm guessing they already have tools to go and localise Win32 resources or something. Unfortunately, it sucks and makes no sense for anyone NOT in their particular position. Now, I hope I'm wrong (I really, really want to be wrong), but I think there's no way to force the message strings to just stay in the XML file and be read from there. 

Finally, things get easy again. Just run "wevtutil install-manifest Some.man" (wevtutil is in system32). In fact, this utility is so user friendly, it even lets you type "im" instead of "install-manifest". At this point, assuming the other steps went well, your provider shows up in Event Viewer.

ECManGen

But wait, how do I actually make that manifest? This part is almost the easiest. In the Windows SDK, there's a lovely little tool called ECManGen. Just fire it up, and go to town adding Providers, Channels, Templates, and Events.

Providers are the main things that show up in your Event Viewer, such as MyApp-FooProduct-LameComp. Channels separate Admin/Operational/Debug and others. Templates are an argument list for Events. If you have, say, a bunch of events that take the same kinds of parameters, you can share templates among them (I find it helpful to create a "SingleStringTemplate".) It's very straightforward.

*Note: I can't actually get Admin channels to work. If I create an event and stick it in an Admin channel and set its level to Informational, MC complains (as does ECManGen) that the level has to be Critical, Error, or Informational. Uh, OK. Instead, just use Operational.

Except... ECManGen is a free utility. (Free? Perhaps not, seeing as the annual MS bill for a 4-person dev team is around $20,000 (counting just MSDN) -- but it's well worth it.) Part of the docs say: "NOTE: For the Manifest Generator Tool to function correctly, the file winmeta.xml (which contains pre-defined metadata values) must either be in the Include directory of the SDK or in the same directory as the Manifest Generator tool." OK, easy enough. Except... it doesn't work that easily. The only way I got it to work was to copy the xml file over to the same directory, *and* start ECManGen from that directory.

Oh yea, ECManGen won't open your manifest file if you pass it as an argument, so forget about cute VS integration. Just Google ecmangen and go rate up the bugs on Connect :).

Going Managed

OK, so you're not living in the last century and use decent tools -- how does this map to C#? First off, you create an EventProvider with the right Guid (the one from your manifest). Then you create an EventDescriptor for each event, matching up all the little parameters (the MSDN docs for EventDescriptor have more details). Finally, you can call WriteEvent, passing the EventDescriptor *by ref* for some reason (no, I can't figure out why).

Oh yea, and you have to hookup that Win32 resource to your C# project, so if you needed another resource (like another app manifest?), you'll have to go deal with merging them and all that hassle. And, don't forget to make sure the parameters you pass into the object[] array of WriteEvent line up with what your manifest has. And also, the .NET API won't even handle the Boolean->BOOL (4 byte) silliness for you. 

In summary, it's a lot of boring, error-prone work, and you'll have to repeat it every time you edit your manifest. Yuck. Maybe it's just easier to use the old event log stuff and forget about all this fancy ETW stuff.

However, the ETW features are pretty cool. As is ecmangen (well, for the most part). I couldn't tear myself away from the promises the new ETW stuff offered, so I sat down and wrote a hack tool to do everything for me. The goal is to have an XML file, then convert it right into a DLL I can use.

ecmanaged does exactly this. Sorta. It doesn't deal with complicated scenarios or performance counters, but handles straightforward event logging with templates in an easy and type-safe fashion.

Using ecmanaged

Usage: ecmanaged ecgen <manifest> /out:<target assembly> /namespace:<target namespace> /class:<target class>
Usage: ecmanaged msglocate <manifest> <messageFileName>
Usage: ecmanaged install <manifest> <targetManifest> <messageFileName>
Usage: ecmanaged uninstall <targetManifest>

Call ecgen with a manifest, and it'll go build an assembly with the wrappers for calling your provider events. It'll stick the Win32 resource on the assembly too, so it's ready to be used.

Call msglocate with the manifest and the message binary file, and it'll go fix the manifest to point to the right place.

Call install, and it'll copy the manifest to another location, fix the msg location, and call wevtutil on it. It copies because I'm assuming your original manifest is under source control, and hence shouldn't be fixed up.

Uninstall just calls wevutil uninstall-manifest.

Yey! Now you can just make a batch file to call this on your manifest and run it every time you change your manifest. What I do is stick this in a folder in my project called "Eventing". I check the resulting binary into source control (just like a 3rd party DLL). The downside is making sure you always update the DLL if you modify the manifest. I'm sure somehow msbuild can come to the rescue (and autoupdate when the manifest changes, so Intellisense keeps working), but this works for me.

I welcome all comments/questions/insults, but please, no death threats. Download: ecmanaged.exe (45 KB)

P.S. No, I don't hate Windows or the devs on it, although I hate unmanaged code (it IS 2008, innit?). Perhaps I'm just annoyed I have to make a Saving Throw versus Hang every time I click Send in Outlook 2007 (yes, patched up, even with Vista SP1... which made it worse). But I still think this whole ETW toolchain is terribly unpolished.

Code
Thursday, February 28, 2008 9:12:25 PM UTC  #    Comments [1]  |  Trackback

# Monday, December 10, 2007
Access is denied when starting Windows Event Log on Vista

I ran into a strange problem today. The Windows Event Log service would not start, stating an error code 5 "Access is denied". This doesn't make any sense as running the services MSC requires elevated permissions and I'm an administrator. That error message alone didn't give me too much of a clue, especially because I don't know much about the event log service. Also, as the event log service was down, troubleshooting it required a bit more work than usual since I couldn't just turn to the event log.

Fortunately, Vista's ETW (Event Tracing for Windows) provided an easy solution. At a command prompt, I ran "logman query providers". This shows a list of all the installed tracing providers on the system. The interesting one in this case is "Microsoft-Windows-Eventlog". Using this information I could start a trace and generate a report. The report indicated that access was denied creating the System channel. It mentioned a path it was trying to use: %SystemRoot%\system32\winevt\logs\System.evtx

I checked the permissions and they looked ok (SYSTEM had full control). So I renamed the System log and gave permissions to everyone on the folder. Then I started the Event log service and it worked fine. It created a new System.evtx. When I checked the permissions I saw that Event log service runs as local system. Apparently that must be the security required. After resetting the permissions, everything seems to be working fine.

I'm posting this since when I searched for a solution, I found several people asking and no answers. This could be the solution for other people. I'd like to hear if anyone knows what might have caused this mess in the first place.

Misc. Technology
Monday, December 10, 2007 3:18:17 PM UTC  #    Comments [4]  |  Trackback

# Thursday, September 06, 2007
Complicated functions in LINQ to SQL

Rob Conery talks about geocoding with LINQ here. In his post, he provides some code using the Haversine formula to compute the distance between two points on earth. His function is declared as:

Func<double, double, double, double, double> CalcDistance = (lat1, lon1, lat2, lon2) => …

Further, this delegate is wrapped in a normal C# method. Now, read my previous post about calling functions in LINQ to SQL. Can you see where things are going to go wrong? That's right, a normal delegate or method can't be converted into SQL by LINQ, as the engine has nothing to work with. Only when our code is available as data can the LINQ to SQL engine do it's magic. Since there's some insinuation that LINQ to SQL just can't handle things like Haversine, I'll demonstrate how to do it.

If you want to use your own "complicated" functions with LINQ to SQL, you'll need to manually construct the predicate expression. It's not pretty, but, it does let you convert somewhat detailed functions, such as Haversine, inside of LINQ-to-SQL queries. This is not always the right approach: in some cases it'll be better to use a UDF or stored procedure. (If someone knows a native, better way, please let me know! TomasP's Expandable stuff looks cool, but I ran into some bugs on Beta 2. This is really something the compiler should help out with!).

To start off, we need to declare our function as an Expression Tree (I cannot vouch for the accuracy of this code; I'm merely demonstrating LINQ to SQL technique. For the geocoding details, read Rob's post.):

const double R = 6367;
const double RAD = Math.PI / 180;
static Expression<Func<double, double, double, double, double>> dist =
(lat1, lon1, lat2, lon2) =>
    R * 2 *
    (
        Math.Asin(Math.Min(1,Math.Sqrt(
            (
            Math.Pow(Math.Sin(((lat1 * RAD - lat2 * RAD)) / 2.0), 2.0) +
            Math.Cos(lat1 * RAD) * Math.Cos(lat2 * RAD) *
            Math.Pow(Math.Sin(((lon1 * RAD - lon2 * RAD)) / 2.0), 2.0) 
           
        ))) 
    );

OK, that was the easy part. We just had to wrap Expression<> around the type declaration and remove the method calls. But how do we pass this to our query? The Where method has this signature:

public static IQueryable<TSource> Where<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate);

Hence, we need to provide a predicate of Expression<Func<TSource, bool>> for it do its magic. To work with expressions, we need to start using the System.Linq.Expressions namespace. For more clarity, I aliased E to System.Linq.Expressions.Expression. Building expressions isn't particularly hard, it's just annoying and time consuming. C#'s lack of "symbolics", or a way to use the compiler to create expressions to reference properties, etc. means we have to use strings to do so. I never did claim it'd be pretty.

We start off with some data from elsewhere in our application. In my example, I'm just going to declare some locals:

double coolLat = 43.641852;
double coolLong = -79.387298;
double maxDist = 100;

The first Expression we need is a ParameterExpression to refer to the source item from the table (i.e., the parameter the Where method is going to give to us). In my code, my table is called Accounts, hence my object type is Account. To create the parameter expression:

var acctParam = E.Parameter(typeof(Account), "a");

Next, we need to be able to reference the fields in the table. With LINQ to SQL, these are properties on our object. We create them like this:

var acctLon = E.Property(acctParam, "Longitude");
var acctLat = E.Property(acctParam, "Latitude");

The secret sauce is creating the invoke to our dist expression. This is where all the work comes into play and includes all our complicated code in the LINQ to SQL query. Fortunately, after building up our arguments separately, it's not that hard:

var distCalc = E.Invoke(dist, E.Constant(coolLat), E.Constant(coolLong), acctLat, acctLon);

We have to use ConstantExpressions for our local variables. Using constant allows us to capture the value of those variables. Now we're ready to finish off, by adding a less than comparison and turning it all into a <TSource, bool> LambdaExpression:

var maxComp = E.LessThan(distCalc, E.Constant(maxDist));
var pred = E.Lambda<Func<Account, bool>>(maxComp, acctParam);

Our query can now look like this:

MembersDataContext dc = new MembersDataContext();
var q = dc.Accounts.Where(pred);
Console.WriteLine(q.ToString());

When we run it, we see that LINQ to SQL is quite capable of handling our little bit of math:

exec sp_executesql N'SELECT [t0].[AccountId], [t0].[Latitude], [t0].[Longitude]
FROM [dbo].[Accounts] AS [t0]
WHERE (@p0 * ASIN(
(CASE
WHEN @p1 < SQRT(POWER(SIN(((@p2 * @p3) - ([t0].[Latitude] * @p4)) / @p5), @p6) + (COS(@p7 * @p8) * COS([t0].[Latitude] * @p9) *
   
POWER(SIN(((@p10 * @p11) - ([t0].[Longitude] * @p12)) / @p13), @p14))) THEN @p1
ELSE SQRT(POWER(SIN(((@p2 * @p3) - ([t0].[Latitude] * @p4)) / @p5), @p6) + (COS(@p7 * @p8) * COS([t0].[Latitude] * @p9) *
    POWER(SIN(((@p10 * @p11) - ([t0].[Longitude] * @p12)) / @p13), @p14)))

END))) < @p15'
,N'@p0 float,@p1 float,@p2 float,@p3 float,@p4 float,@p5 float,@p6 float,@p7 float,@p8 float,@p9 float,
    @p10 float,@p11 float,@p12 float,@p13 float,@p14 float,@p15 float'
,@p0=12742,@p1=1,@p2=95.412000000000006,@p3=0.017453292519943295,
@p4=0.017453292519943295,@p5=2,@p6=2,@p7=95.412000000000006,@p8=0.017453292519943295,@p9=0.017453292519943295,@p10=102.63200000000001,
@p11=0.017453292519943295,@p12=0.017453292519943295,@p13=2,@p14=2,@p15=100

I still think there should be some kind of syntax so we could write it like we want to: (Where(a=>dist(coolLat, coolLong, a.Latitude, a.Longitude) > 10)). If anyone knows a built-in way to do it, please let me know. I'm sure it's just something simple I'm overlooking...

Code
Thursday, September 06, 2007 1:56:54 PM UTC  #    Comments [4]  |  Trackback

# Wednesday, September 05, 2007
Calling custom methods in LINQ-to-SQL

This was sparked by the issues raised by Rob Conery here. Basically, if you have some semi-complicated function that you need to apply to a LINQ-to-SQL query, how can you do it? This is somewhat covered by TomasP.NET: Building LINQ Queries at Runtime in C# and Joseph Albahari: Dynamically building LINQ expression predicates. I recommend those two articles; they're very good. I'm going to write a bit about it too. Some of it might be redundant, some of the ideas I took from those articles. The code is all mine.

So, let's say we want to get all accounts where the square root of the account ID is even. This will serve as our placeholder for a totally contrived example. Just calling our method in our LINQ query won't work because the LINQ-to-SQL code isn't going to know what to do with it. A method is just an opaque block of code with a name. Here's an example:

static void Main(string[] args) {
    MembersDataContext dc = new MembersDataContext();
    var q = dc.Accounts.Where(a => IsRightAccount(a));
    Console.WriteLine(q.ToString());
}
static bool IsRightAccount(Account a) {
    return Math.Sqrt(a.AccountId) % 2 == 0;
}

This code crashes with: Unhandled Exception: System.NotSupportedException: Method 'Boolean IsRightAccount(ConsoleApplication1.Account)' has no supported translation to SQL. Which should be expected, as LINQ to SQL cannot know what goes on inside that method and thus can't translate it.

Let's change things to make IsRightAccount be a Func delegate (from a lambda expression):

static Func<Account, bool> IsRightAccount = a => Math.Sqrt(a.AccountId) % 2 == 0;

Now we get: Unhandled Exception: System.NotImplementedException: The method or operation is not implemented. At System.Data.Linq.SqlClient.QueryConverter.VisitInvocation(InvocationExpression invoke). That's a somewhat strange exception, as I'd expect it to be a bit more helpful. At any rate, I'd expect it to crash, because that is just a delegate to a method. It's still an opaque block of code.

Enter the magic Tree of Expressions: Expression<T>. As I mentioned in my last post, Expression Trees provide "introspection" (reflection against code). In the examples above, the lambda that calls IsRightAccount for the Where clause actually turned into an InvocationExpression that represents a call to the delegate provided. Hence me saying that it is "opaque". What we need is to make sure that our code (our IsRightAccount calculation) is visible as data. When it's visible as data, then LINQ-to-SQL can go and say "Oh, you want to take the square root of the account ID, mod it by 2, and see if that's zero… now THAT I can do in SQL".

Declaring an Expression Tree is really simple. First, make sure you import System.Linq.Expressions if you don't want to fully qualify the name. Then, declare your tree just like any lambda Func, except this time make the type Expression<MyFunc>:

static Expression<Func<Account, bool>> IsRightAccount = a => Math.Sqrt(a.AccountId) % 2 == 0;

We will also change our Where clause to accommodate the fact that we are not calling a method:

var q = dc.Accounts.Where(IsRightAccount);

And presto! Our program now shows a SQL conversion:

SELECT [t0].[AccountId], [t0].[Email]
FROM [dbo].[Accounts] AS [t0]
WHERE (SQRT(CONVERT(Float,[t0].[AccountId])) % @p0) = @p1

Things get a bit more complicated when we try to stack expressions together. As far as I know, we must create the Expression manually (using the Expression static methods); the compiler won't help out. If anyone knows a built-in way around this, please let me know. Otherwise, see the links at the top of this article for more information and other workarounds.

This also applies if you have complex logic that doesn't directly map to a item -> bool predicate expression. In those cases, you can still encapsulate most of your code by using the compiler to generate the bulk of the Expression, and then just wrap it with a bit of hand-created expression. In my LINQ to the CRUD article, the code attached uses this approach to generate queries for the select/delete commands. Again, I will note that TomasP has written expansions so you can just write myCoolExpression.Expand(arg) rather than building everything by hand.

If you know of any other links or work done in this area, I'm very interested in seeing other approaches. Thanks!

Code
Wednesday, September 05, 2007 11:58:49 PM UTC  #    Comments [2]  |  Trackback

C# 3.0 and LINQ Misunderstandings

Apparently, there is some considerable confusion over all the new C# language features. People who I would hope are reasonably intelligent are completely misunderstanding some C# fundamentals. Agreed, a lot of the new concepts introduced to C# 3.0 are might seem relatively foreign to C# users. Microsoft's marketing related to LINQ doesn't help much either. I'm going to try to clarify the top few things I've seen. I'll reference the C# spec (http://msdn2.microsoft.com/en-us/library/ms364047(vs.80).aspx) so you can know I'm being accurate.

Myth: LINQ is just a data access technology

While data access is obviously a large part of LINQ (even the name stands for Language Integrated Query), you can do a lot more than just access data. Re-using a previous example:

    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());

There's no sign of data access there! The practical functional C# articles on my site go into more detail on this. Suffice to say, while a lot of new features were added to "make LINQ possible", much, much, more is possible than just creating queries. (Alternatively, you might chose to limit the meaning of LINQ, as I'd like to do. In this case, I wouldn't consider using lambdas, extensions, etc. as LINQ. Others may disagree. Marketing?)

Implicitly typed local variables (var keyword)

C# is still statically and strongly typed. But, there's a new feature that lets you declare local variables without specifying the type, if the type can be inferred from the initializing expression. From the spec:

var i = 5;
var s = "Hello";
var d = 1.0;
var numbers = new int[] {1, 2, 3};
var orders = new Dictionary<int,Order>();

The implicitly typed local variable declarations above are precisely equivalent to the following explicitly typed declarations:

int i = 5;
string s = "Hello";
double d = 1.0;
int[] numbers = new int[] {1, 2, 3};
Dictionary<int,Order> orders = new Dictionary<int,Order>();

Oddly enough, the C# spec doesn't even mention LINQ or anonymous types when it talks about "var" locals. Why is there confusion about this simple feature? Let's examine anonymous types:

C# anonymous types allow you to declare a type just by specifying its fields. From the spec: "C# 3.0 permits the new operator to be used with an anonymous object initializer to create an object of an anonymous type." For instance, the following code is a valid expression:

new { Name = "Michael" }

This produces an object of a new, anonymous, type, containing a single string property called Name. Hence, this code works:

Console.WriteLine(new { Name = "Michael" }.Name);

However, how can you assign such an object to a variable? Yes, that is the only "need" for the var keyword. There's no way to name the type, since it's anonymous. Regardless if you agree with anonymous types (versus a Tuple class), this is the place where you *need* to use var: assigning an anonymous type to a local.

As you may have noticed, I still haven't mentioned LINQ. Anonymous types are not LINQ specific. They are, however, particularly helpful for certain LINQ queries:

var topCustomers = MyDatabase.Customers.Where(c => c.GoldStar == true).Select(c => new {c.CustomerId, c.Name});

Because of this, people start to associate anonymous types only with LINQ queries and hence, var with LINQ only. The truth is that these features can be used anywhere.

What's the takeaway here? The var keyword simply allows the compiler to infer the type of the variable so you don't have to specify it. Nothing more, nothing less. Some people still want to explicitly annotate every single variable – hey, that's their choice. But don't be locked into this just because there was no option before. Me? I'll take more concise code any day!

As a side note (if this wasn't clear), the var keyword is NOT dynamic typing, just implicit typing (type inference).

Dynamic C#

Seems there's a lot of confusion about C# being dynamic. C# is not dynamically typed, as some seem to imply. I think perhaps some of the confusion comes from all the nice type inference that C# provides. Using the var keyword as shown above might make some people feel "oh, I'm saying var just like Javascript!". Adding to the confusion is the fact that C# is now a "semi"-functional language. For instance, from "why's (poignant) guide to ruby", we see this Ruby code:

5.times { print "Odelay!" }

C# allows us to write in a similar style:

5.Times(() => Console.WriteLine("Odelay!"));

The next Ruby example in that guide goes like this:

exit unless "restaurant".include? "aura"

In C#, we can write:

exit.Unless(() => "restaurant".Contains("aura"));

The Wikipedia article on Functional Programming lists a few features that dynamic languages usually have:

Eval

Sorta… you can create Expression<T> and execute them. On the other hand, you can't do anything like eval(myString) (which is just asking for runtime failure).

Higher-order functions

Definitely in there.

Runtime alteration of object or type system

No, not really. (I.e., maybe you can hack around with certain APIs to try to do some magic, but it's not a language feature.)

Functional Programming

Yes. But still, functions aren't really first class citizens…yet. Once we can start using method groups as Actions and Funcs, implicitly, then it'll get even better. This is an interesting presentation from Andrew Kennedy, Microsoft Research: C# is a functional programming language.

Closures

Yep, since anonymous methods were introduced in C# 2.0.

Continuations

Extremely limited, in the form of yield return.

Introspection

Not just reflection, but actually inspecting the actual code. C# 3.0 has this in the form of Expression<T> (see below).

Macros

No, not crappy C-style macros. Here, I'm thinking more like macros that'd let you create things like C# query comprehensions, *in source code*. (Which is what I was actually hoping when I saw the new query comprehension syntax… no such luck).

In summary, C# lets you gain a lot of benefits usually associated with dynamic programming, but without the nasty parts of dynamic typing.

A great paper on this subject is Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages, by Erik Meijer and Peter Drayton of Microsoft.

Myth: Extension methods add methods to a class

This is a tricky one, since extension methods appear to be exactly that. This myth is also somewhat perpetuated by the C# spec: "In effect, extension methods make it possible to extend existing types and constructed types with additional methods." But, right before that, the real explanation is given: "Extension methods are static methods that can be invoked using instance method syntax."

Essentially, Extension methods allow us to use infix notation with certain methods. This explains the line from the spec "in effect". A more helpful way to think of this is by thinking of the "." operator as an overloaded operator that also allows passing the first operand given to it as the first argument to specially marked methods.

An alternative* would be to define an operator like the F# pipeline operator (|>). In C#, this would let us write stuff like:

customers |> Seq.Where(c => c.Name == "Michael")

That doesn't look like an improvement. BUT, we no longer need to mark methods in a special way. We can just use them:

myArray |> Array.BinarySearch("s")

Why do we need infix notation anyways? Well, the normal prefix notation can be difficult to read:

Select(Where(customers, c => c.Cool == true), c => c.Name)
Array.BinarySearch(items, "S")

Extension methods just make those functions easier to pipeline. That's all folks. Think of them like this, and save yourself a headache about what "extending a type" means.

*My guess as to why extension methods are done they way they are is because it could confuse people if you have something like "item |> Stuff.SomeMethod("X")" where SomeMethod returns a function. Or, where you have "item |> Stuff.SomeMethod("X").SomethingElse("y"). I'm still annoyed that I can't use infix semantics where *I* want, but oh well.

Lambda expressions and Expression<T>

Spec: "Lambda expressions provide a more concise, functional syntax for writing anonymous methods.". Spec: "Expression trees permit lambda expressions to be represented as data structures instead of executable code. A lambda expression that is convertible to a delegate type D is also convertible to an expression tree of type System.Query.Expression<D>."

So, "lambda expressions" can either be just code (i.e., directly executable IL) OR they can get turned into a data structure, Expression<T>.

Adding to the confusion, a lambda expression can contain just an expression (i=> i + 1), or it can be a block of statements ( i => {Write(i); i++; Write(i); return i+1;} ). However, a lambda expression with a statement block body cannot become an Expression<T>. (As far as I know, VB's lambdas only allow for expression bodies, not blocks.)

An example:

Expression<Func<int, int>> inc = i => i + 1;

Is equivalent to:

ParameterExpression param_i = Expression.Parameter(typeof(int), "i");
var inc2 = Expression.Lambda<Func<int, int>>(
    Expression.Add(
        param_i, 
    Expression.Constant(1, typeof(int))),
param_i);

At runtime, you can then go inspect the actual code and decide what to do with it. This is exactly the premise for LINQ to SQL. When you create a LINQ-to-SQL query, it is turned into an expression like shown above. Then the LINQ-to-SQL APIs inspect and convert that expression tree into SQL statements.

Here, I can understand the confusion. The word "expression" is used in three distinct manners. Rightfully, Expression Trees should be referred to as Expression or Expression<T>, which could help clear up some of the confusion. Additionally, it doesn't help that lambdas have these different conversion rules (although working around it could be ugly, possibly).

Are there any other features that you've seen misused or you've had questions about? Let me know! I love comments, insults, and suggestions.

Want to correct me on something? Go right ahead! But, if you're going to say something like "C# doesn't have type inference", please make sure to either be an expert on the matter or be able to quote an authoritative source or show a proof. Thanks!

Code
Wednesday, September 05, 2007 3:01:18 AM UTC  #    Comments [10]  |  Trackback

# Wednesday, August 29, 2007
Practical Functional C# - Part IV – Think in [Result]Sets

Don't skip parts I to III to get you up to speed. I have $300 up for grabs if these articles are incorrect in stating that it will greatly improve most C# apps:

    Practical Functional C# - Part I 
    Practical Functional C# - Part II 
    Practical Functional C# - Part III - Loops are Evil

Tell the compiler what you want, rather than how to do it. That's a key concept that can take you very far. However, as imperative programmers, this can sometimes be a hard concept. We are so used to instructing the processor, step-by-step, how do to things, and then only after we're all done, making sure the end effect is to our liking. Functional programming helps us change this.

Consider the following task: Write a function that checks a username against a few disallowed characters "!@#$%^&*". The normal C# implementation looks like this (null checks removed):

static bool IsBadUserName(string userName)
{
    var badChars = "!@#$%^&*";
    foreach (char c in badChars) {
        if (userName.Contains(c)) return true
    }
    return false;
}

It's not horrible; there's only one branch, but it does require a bit of thought to sort out. Now let's take an approach where we deal with string as a set of characters to manipulate at once (requires C# 3.0 compiler):

static bool IsBadUserName2(string userName){ 
    var badChars = "!@#$%^&*";
    return userName
        .Intersect(badChars) 
        .Any();
}

We've reduced the function from a step-by-step loop into something that has only path. The bigger benefit is that there's no need to evaluate end conditions and so on for correctness: The code says exactly what it does. "Are there any bad characters in the user name?" We don't have to worry how it does what we asked, we just need to think in terms of what we want our result to be.

A common function in functional languages is called "map". Map takes a list of something and turns it into a list of something else. For instance, if we had a list of integers ("ourInts"), we could turn them into squares by saying "map ourInts by multiplying each value with itself". In C# (LINQ), they called map "Select". Here's a quick example:

var ourInts = new[] { 2, 5, 13 };
var squares = ourInts.Select(i => i * i);

Squares will contain the list { 4, 25, 169 }. What use is this? Well, it is an extremely common pattern to take some set of data, filter it, modify it a bit, and return a new set of data. Here's an example: You have a variable from containing semicolon-delimited email addresses. You want to turn these into an array of .NET's MailAddress objects to use with some other code. The loop isn't very pretty:

var tempAddresses = new List<MailAddress>();
foreach (string s in semicolonEmails.Split(';')) {
    var ts = s.Trim();
    if (ts == "") continue
    tempAddresses.Add(new MailAddress(ts)); 
}
var myAddresses = tempAddresses.ToArray();

But consider the functional approach:

var myAddresses = semicolonEmails
    .Split(';'
    .Select(s => s.Trim())
    .Where(s => s != ""
    .Select(s => new MailAddress(s))
    .ToArray();

In one statement, we transform the data three times, as well as add filter to remove empty items.

One place where LINQ falls flat on its face is when it comes to processing data. For some reason, there are no methods defined to do "ForEach" or "Process". (Even more interesting: List<T> does define these methods.) Process is a great pattern: on each item, it performs some action, then returns the original item. The code to define it is very simple and looks like this:

static IEnumerable<T> Process<T>(this IEnumerable<T> source, Action<T> f)
{
    foreach (var item in source) {
        f(item); 
        yield return item; 
    }
}

How is this of use? Well, let's chain together some functional and imperative processing. For instance, write a program that does some long process on all files passed in as arguments – on separate threads. If we take the purely imperative approach, our code looks like this:

static void Main(string[] args)
{
    var threads = new List<Thread>();
    foreach (var s in args) {
        Thread t = new Thread(startLongProcess); 
        t.Start(s); 
        threads.Add(t); 
    }
    foreach (var t in threads) {
        t.Join();
    }
}
static void startLongProcess(object data)
{
    SomeLongProcess((string)data);
}

Yes, we actually must declare a separate function just to invoke SomeLongProcess. Let's combine and use the functional approach now:

static void Main(string[] args)
{
    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());
}

Which way is going to be easier to edit and change around? I don't know about you, but for me, going from ~12 to ~5 lines, removing extra variables, useless functions and flow control structures: that's a hands-down win in my book.

As a side note, threading, in fact, is a space that is extremely ripe for functional styles. I'm willing to bet that ".NET 4" will include threading extensions that rely heavily on functional concepts. For instance, it's easy to create a method that allows us to replace the previous program with this:

args.Parallel(SomeLongProcess);

But I'll talk about that another day.

In the next article, I'm going to cover C#'s new inner functions capability and how that can be used to help build up more complicated function chains. I'd also like some feedback on which kinds of areas of C# programming you've run into that seem to require more code than necessary.

Code
Wednesday, August 29, 2007 10:45:59 AM UTC  #    Comments [4]  |  Trackback