Logo




Subscribe:
RSS 2.0 | Atom 1.0
Categories:

Sign In


[Giagnocavo]Michael::Write()

# Friday, May 07, 2010
F# Unit Testing with Visual Studio 2010

In Visual Studio 2010, you can now get your F# unit tests loaded by the IDE. First, create your F# test project. This is just a normal library referencing Microsoft.VisualStudio.QualityTools.UnitTestFramework and having test classes and so on. The only change you need to do is go into the configuration and set the output folder to "bin" for both debug and release (instead of bin\debug).

Next, create a C# Test Project, and delete the code file. Then add an existing item, and navigate to the bin directory of your test project output. Select the DLL and Add as Link.

Finally, right click the solution, and change the project dependencies so the C# test project depends on your F# test project.

That's it. Now the IDE will pickup your F# tests and allow you to manage, run, and debug them right from VS. You may have to restart the IDE after setting all this up for it to work smoothly. 

Code | FSharp
Friday, May 07, 2010 4:57:57 AM UTC  #    Comments [2]  |  Trackback

# Thursday, May 06, 2010
Command line tool for DPAPI

Using the data protection API, I found I needed to be able to generate values by hand that my apps can work with later. So I wrote a small command line utility to help out with that process.

Example:

dpapicmd [/user] [/decrypt] [/utf8] [/entropy:<entropy>] {/clipboard|<text>}
Text is read as Base64 bytes unless /utf8 is used. Decrypted input and encrypted output is always Base64.

C:\>dpapicmd /utf8 /entropy:dogstuff "I'm barking mad."
AQAAANCMnd8BFdERjHoAwE/Cl+sBAAAAOE510Ds5PkGpG2g7PxkgXwQAAAACAAAAAAAQZgAAAAEAACAAAACSClIQpWDawT26jRrsFr/HauG2NjUw963fPKH+AcXqlwAAAAAOgAAAAAIAACAAAABwxg
Osbkh6TfdTUzaiEGgKJ/ohL91VGHIpRDrBeR7wvyAAAADyvUh1W+bmzUFago/LybBvmmQD96x2vCOiJgpPxTNItEAAAAAmVJ3DoTnIAfUIsO2ea8hqs6Rpp77gvDn77XAzXfACRY3IGT7BjicYJ7Og
NQhzsrHCybBK0DRQchrPK5+XT8TR

C:\>dpapicmd /utf8 /dec /entropy:dogstuff AQAAANCMnd8BFdERjHoAwE/Cl+sBAAAAOE510Ds5PkGpG2g7PxkgXwQAAAACAAAAAAAQZgAAAAEAACAAAACSClIQpWDawT26jRrsFr/HauG2
NjUw963fPKH+AcXqlwAAAAAOgAAAAAIAACAAAABwxgOsbkh6TfdTUzaiEGgKJ/ohL91VGHIpRDrBeR7wvyAAAADyvUh1W+bmzUFago/LybBvmmQD96x2vCOiJgpPxTNItEAAAAAmVJ3DoTnIAfUIsO
2ea8hqs6Rpp77gvDn77XAzXfACRY3IGT7BjicYJ7OgNQhzsrHCybBK0DRQchrPK5+XT8TR

I'm barking mad.

Download: dpapicmd.exe (7.5 KB) (Requires .NET 2.0 because I'm too lazy to write it in C.)

And horrible hacky source: dpapicmd.cs.txt (4.25 KB)

 

Code | Security
Thursday, May 06, 2010 5:53:33 PM UTC  #    Comments [0]  |  Trackback

# Wednesday, April 28, 2010
FluentNHibernate FSharp update for RTM
I've updated the code we use for FluentNHibernate with F# - using the final versions of both products. It's just some code to fixup F# quotations to LINQ expressions that FluentNHibernate can work with, and type extensions to make it easy to consume from F#.

The FluentNHibernate RTM zip has all the other binaries you need to get started.

I'm building against .NET 4, but the code should work on 2.0 as well.

FluentNHibernate.FSharp.dll (38.5 KB)

FluentNHibernate.FSharp.zip (6.95 KB) Source

And here's the FluentNHibernate sample first project using FluentNHibernate.FSharp:

FirstProject.zip (7.66 KB)

Here's an example of what some mapping code looks like:

type ProductMap() as m = inherit ClassMapQ<Product>() do

    let x = m.DefaultX

    m.Not.LazyLoad()

    (m.IdQ <@ x.Id @>).Done

    (m.MapQ <@ x.Name @>).Done

    (m.MapQ <@ x.Price @>).Done

    (m.HasManyToManyQ <@ seq x.StoresStockedIn @>)

        .LazyLoad()

        .Cascade.All()

        .Inverse()

        .Table("StoreProduct")

        .Done


Code | FSharp
Wednesday, April 28, 2010 9:56:22 PM UTC  #    Comments [0]  |  Trackback

# Wednesday, April 01, 2009
Using Fluent NHibernate with F#

Fluent NHibernate is a nice way to be able to use NHibernate without having to deal with all that unchecked XML. This morning I decided to find out how well it works with F#. Things went relatively smooth. I've converted some code samples from the Fluent NHibernate First Project. I suggest having that open to fill in any gaps. I've also included the full project code and DB script at the end of this article.

If you're not familiar at all with Fluent NHibernate, basically it takes advantage of lambda expressions as Expression<T> to provide a somewhat strongly typechecked reflection system. Thus, instead of having attributes with hard-coded strings, or XML files, you have expressions that target the properties of objects you wish to map. The Fluent NHibernate library then takes care of hooking it up to NHibernate, and away you go. Something like that anyways.

Classes

So, first, we define the "entities" like the C# project does. Here's the first little pain. F# doesn't really support C#'s idea of "automatic properties". You can have vals on a class, which act like fields (although, they are implemented as properties). Or, you can manually specify them, like you would in earlier versions of C# which didn't have the auto-gen-a-field-for-me.

F# doesn't encourage uninitialized values. If you have uninitialized fields (vals), you need to mark them with an attribute to say you know what you're doing. So, that adds a bit more code overhead. What I do here is to alias the DefaultValueAttribute to "DV". So the first bit of our classes looks like this:

#light

namespace FHib.Entities

 

open System.Collections.Generic

 

type DV = DefaultValueAttribute

 

type Employee() as this =

    [<DV>] val mutable Id : int

    [<DV>] val mutable FirstName : string

    [<DV>] val mutable LastName : string

Now, NHibernate relies on having virtual properties so that it can dynamically create code to do nifty things like lazy loading. In F#, creating a virtual member means defining an abstract member and providing an implementation, in the same class. Since we don't have automatic properties, this means we'll have to define the backing field ourselfs. In all, the full code for the virtual property "Store" (virtual so NHibernate can lazy-load it) is:

    abstract Store : Store with get,set

    [<DV(false)>] val mutable _store : Store

    override x.Store with get() = x._store and set(v) = x._store <- v

Not the pinnacle of short code, but not horrible all things considered. The rest of the entity mappings are rather straightforward so I'll skip them here. I stuck with vals for anything that didn't have to be virtual, to keep it more concise.

Mappings: Easy Start

OK, so now to the "real" work. Fluent NHiberate looks for classes that subclass ClassMap<T>. It then creates an instance of them, which allows you to call the mapping methods in the object's constructor. From the First Project:

public class EmployeeMap : ClassMap<Employee>   {   
  public EmployeeMap()   {   
    Id(x => x.Id);   
  }   
}  

OK, so how do we convert this to F#? It's easy... except for that lambda. The lambda in this case compiles to an Expression<Func<Employee, object>>. F#'s compiler does not support Expression<T>. So, we turn to experimental support. Referencing the FSharp.PowerPack.Linq assembly gives us the Microsoft.FSharp.Linq.QuotationEvaluation module. This extends the F# quotation type, Expr, with "ToLinqExpression", which returns an untyped LINQ Expression object.

To get this untyped Expression (LINQ) out of an Expr (F#) and into an Expression<T> (F#) suitable for Fluent NHibernate's consumption, I started off writing this tiny helper module:

module LinqHelper =

    open Microsoft.FSharp.Quotations

    open Microsoft.FSharp.Linq.QuotationEvaluation

    open System.Linq.Expressions

 

    let ToLinq (exp : Expr<'a -> 'b>) =

        let linq = exp.ToLinqExpression()

        let call = linq :?> MethodCallExpression

        let lambda = call.Arguments.[0] :?> LambdaExpression

        Expression.Lambda<Func<'a, 'b>>(lambda.Body, lambda.Parameters)

When an F# quotation of a lambda is turned into a LINQ expression, the root node is a useless MethodCallExpression. So, we unwrap that by taking it's argument and using it to generate a typed lambda Expression<T>.  

This is a good start. But many of the Expressions that Fluent NHibernate looks for have a return type of object. Instead of having to write "box" or "upcast" all over, I added another function called "ToLinqObj". This takes an Expr<'a -> 'b>, but returns an Expression<Func<'a, obj>>.

Finally, writing "ToLinq" and "ToLinqObj" seemed too verbose, so I added some operators to the LinqHelper module:

    let (~@) expr = ToLinq expr

    let (~@@) expr = ToLinqObj expr

Now we can start mapping:

open LinqHelper

 

type EmployeeMap() = inherit ClassMap<Employee>() do

    base.Not.LazyLoad()

    base.Id ~@@ <@ fun x -> x.Id @> |> ignore

    base.Map ~@@ <@ fun x -> x.FirstName @> |> ignore

    base.Map ~@@ <@ fun x -> x.LastName @> |> ignore

    (base.References ~@ <@ fun x -> x.Store @>).LazyLoad() |> ignore

We start off by disabling LazyLoad because most of the properties are not virtual, and NHibernate will fail to validate the mapping. Instead, we explicitly LazyLoad things, like the Store reference.

Mappings: Modifying the Expressions

Unfortunately, things didn't stay so simple. The Fluent NHibernate methods "HasMany" and "HasManyToMany", for instance, don't work with F#'s type inference. This is because they have several overloads, a couple of which take expressions. If we try this in the StoreMap:

    base.HasMany ~@@ <@ fun x -> x.Staff @> |> ignore

We get an error because F# doesn't know what x is. (error FS0055: This lookup uses a deprecated feature, where a class type is inferred from the use of a class field label. Consider using a type annotation to make it clear which class the field comes from.) Using ~@ to keep it strongly typed fails as well; it can't figure out the overload. This is nothing surprising -- overloading is the enemy of type inference.

So, what do we do? Type annotations are not what I consider fun. So instead, we'll add non-overloaded ClassMap<'t> type extensions into the LinqHelper module:

    type ClassMap<'t> with

        member x.HasManyX expr = (x.HasMany : Expression<Func<'t, seq<_>>> -> _) (ToLinq expr)

        member x.HasManyToManyX expr = (x.HasManyToMany : Expression<Func<'t, seq<_>>> -> _) (ToLinq expr)

By providing these extensions that are explicit once, we enable type inference for the rest of the time. We can now finish the StoreMap:

type StoreMap() = inherit ClassMap<Store>() do

    base.Not.LazyLoad()

    base.Id ~@@ <@ fun x -> x.Id @> |> ignore

    base.Map ~@@ <@ fun x -> x.Name @> |> ignore

 

    (base.HasManyX <@ fun x -> upcast x.Staff @>)

        .LazyLoad()

        .Inverse().Cascade.All() |> ignore

    (base.HasManyToManyX <@ fun x -> upcast x.Products @>)

        .Cascade.All()

        .WithTableName("StoreProduct") |> ignore

The only type "annotation" we need here is the upcast for the HasManyX. This is because F# forces you to be explicit. Accessing "Staff" in the first quotation means type IList<Employee>, not seq<Employee>. The upcast will sort this out for us.

This compiles fine. But remember how I said using expressions was a "somewhat strongly typed" way of doing things? The expression trees are interpreted at runtime, which allows failures a more complex type system might prevent. This is one of the times. If we try to execute it as-is, we get this exception:

 ---> FluentNHibernate.Cfg.FluentConfigurationException: An invalid or incomplete configuration was used while creating a SessionFactory. Check PotentialReasons collection, and InnerException for more detail.

 ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentException: Not a member access Parameter name: member

Ouch. Fluent NHibernate does not appreciate our expression trees. Why? Using FSI, we can inspect what we're actually converting the F# quotations to:

type SomeClass() = member x.Stuff = [|1;2;3|];;
let myExpr : Expr<SomeClass -> seq<int>> = <@ fun x -> upcast x.Stuff @>;;
let myLinq = LinqHelper.ToLinq myExpr;;

> myLinq;;
val it : Linq.Expressions.Expression<Func<SomeClass,seq<int>>>
= x => (x.Stuff As IEnumerable`1)
    {Body = (x.Stuff As IEnumerable`1);
     NodeType = Lambda;
     Parameters = seq [x];
     Type = System.Func`2[FSI_0022+SomeClass,System.Collections.Generic.IEnumerable`1[System.Int32]];}

> myLinq.Body;;
val it : Linq.Expressions.Expression
= (x.Stuff As IEnumerable`1)
    {IsLifted = false;
     IsLiftedToNull = false;
     Method = null;
     NodeType = TypeAs;
     Operand = x.Stuff;
     Type = System.Collections.Generic.IEnumerable`1[System.Int32];}

So, what's going on? The upcast is modifying the F# quotation (as it should), and the ToLinqExpression is sticking this in as a "TypeAs" node. Fluent NHibernate does not seem to like this. Not. One. Bit.

But, it IS ok if we have a Convert node in the Expression tree. I imagine this is because C# generates such nodes in its expression trees (say, accessing an int member in a Func<T, object> expression). So, our last hack in making F# work right here is adding a fixup function to our LinqHelper, and using it from ToLinq:

    let fixup (lexpr:LambdaExpression) =

        if lexpr.Body.NodeType <> ExpressionType.TypeAs then lexpr else

        let typeAs = lexpr.Body :?> UnaryExpression

        let newBody = Expression.Convert(typeAs.Operand, typeAs.Type)

        Expression.Lambda(lexpr.Type, newBody, lexpr.Parameters)

Now it's happy with our expressions.

Finally

The rest of the First Project is pretty straightforward in F#. For instance, creating the Session Factory:

let createSessionFactory() =

    Fluently.Configure()

        .Database(

            MsSqlConfiguration.MsSql2005.ConnectionString(fun csb ->

                csb.Is("server=(local);database=fhib;Integrated Security=true") |> ignore))

        .Mappings(fun m ->

            m.FluentMappings.AddFromAssemblyOf<Mappings.EmployeeMap>() |> ignore)

        .BuildSessionFactory()

Everything seems to execute as it should. Fluent interfaces in F# generate a bit more noise, because of all the ignores that have to be added. I'm thinking of creating a type extension to obj to add Ignore as a method, to make it look a bit more uniform.

Future

It's my hope that F# has now addressed most of these issues -- 1.9.6 is 7 months old. Expression<T> is growing in importance, outside of LINQ querying, so I cannot see F# not being able to handle them easily and generating similar output to C#. It'd be really neat if F# would auto-convert quotations to Expression<T> when the expr is a syntactic argument like it does with delegates now. I'd also be surprised if abstract/virtual members still lack the ability to define accessibility.

Code: fhib.zip (4.76 KB)
I did not include Fluent NHibernate or any of its dependencies. The project expects them to be in the project's lib folder, so get them here if you don't have them and extract them to "lib".

I welcome comments on this and suggestions for making the code more concise.

Code | FSharp
Wednesday, April 01, 2009 8:01:29 AM UTC  #    Comments [0]  |  Trackback

# Sunday, March 22, 2009
Using symmetric encryption to pass messages

This entry was triggered by this question. Someone asked how to use AES, and we got 2 sample classes that do it wrong. The flaw in both was that they shared the IV which means your ciphertext will can leak information. One answerer didn't believe me at first, but then got it and deleted his code. The other person got offended and said IVs, performing authentication, etc. are all "corner cases" and any problem is "contrived". So, I'm going to provide a bit of code and show two problems that arise from not generating a unique IV for each message and not authenticating the decrypted data.

First, I wrote this a while ago: What is an IV?. That describes what an IV is, and why you need a unique one for each message. Wikipedia also has good information on this. Now for the demo. I used the code at the bottom, but removed the hashing and random IV from the code. So it's just encrypting the with the same key and IV for each message -- very straightforward. Here are the messages and their ciphertext:

"Alice; Bob; Eve;: PerformAct1"
"Alice; Bob; Eve;: PerformAct2"

tZfHJSFTXYX8V38AqEfYVXU5Dl/meUVAond70yIKGHY=
tZfHJSFTXYX8V38AqEfYVcf9a3U8vIEk1LuqGEyRZXM=

Notice how the first block of ciphertext is the same? All messages starting with "Alice; Bob; Eve;" will have that same first block. That means an attacker, after getting this ciphertext once, now knows if any message is addressed the same way. Very, very straightforward, basic attack. Now, maybe for a specific implementation you have in mind, this is not an issue. But it's still improperly implemented cryptography.

For the next attack, we're going to show that even with a random IV, you need to authenticate your decrypted messages. This code generates a 64-bit integer and encrypts it with AES and a random key/IV. Then, it starts changing bytes until the decrypt succeeds. Presto: the attacker was able to present a completely different value, and the decryption was successful.

public static void Main() {
    var buff = new byte[8];
    new Random().NextBytes(buff);
    var v = BitConverter.ToUInt64(buff, 0);
    Console.WriteLine("Value: " + v.ToString());
    Console.WriteLine("Value (bytes): " + BitConverter.ToString(BitConverter.GetBytes(v)));
    var aes = Aes.Create();
    aes.GenerateIV();
    aes.GenerateKey();
    var encBytes = aes.CreateEncryptor().TransformFinalBlock(BitConverter.GetBytes(v), 0, 8);
    Console.WriteLine("Encrypted: " + BitConverter.ToString(encBytes));
    var dec = aes.CreateDecryptor();
    Console.WriteLine("Decrypted: " + BitConverter.ToUInt64(dec.TransformFinalBlock(encBytes, 0, encBytes.Length), 0));
    for (int i = 0; i < 8; i++) {
        for (int x = 0; x < 250; x++) {
            encBytes[i]++;
            try {
                Console.WriteLine("Attacked: " + BitConverter.ToUInt64(dec.TransformFinalBlock(encBytes, 0, encBytes.Length), 0));
    
            return;
            } catch { }
        }
    }
}

Here's an example run:
Value: 5686260040031435365
Value (bytes): 65-7A-92-1A-61-A7-E9-4E
Encrypted: F4-62-AC-02-2D-7D-43-6A-4D-97-68-4D-95-9F-8A-DF
Decrypted: 5686260040031435365
Attacked: 1603329786558177755

Since there's no authentication of the decrypted data, an attacker can just play with the ciphertext until it generates an acceptable value. Perhaps you have other mitigations in your implementation/application for this, but why rely on that?

Here's some demo code (I haven't tested it much, so it might have some major issues -- but not by design AFAIK). Note this just shows performing an encryption operation, including the IV in the message, and verifying the decrypted bytes. Other things like replay attacks are not considered. If you're trying to learn how to use crypto so you can drop it into an application, STOP, then go read enough to understand what you're doing and the implications for your particular application.

aesdemo.cs.txt (4.38 KB)

Code | Security
Sunday, March 22, 2009 11:46:56 PM UTC  #    Comments [0]  |  Trackback

# Thursday, December 11, 2008
Full Typed Faults in Silverlight 2 (FaultException)
Silverlight 2 doesn't support faults, so you can't catch them. That sucks, since WCF's FaultException<T> is pretty useful. In Beta 1 I did a vile hack to get some data. After reading online, Id determined that Silverlight should be able to intercept the fault message and interpret it. Unfortunately, some of the handy classes to insert yourself into the stack aren't in Silverlight's WCF. Fortunately, there is a sample that does most of it:

http://code.msdn.microsoft.com/silverlightws/Release/ProjectReleases.aspx?ReleaseId=1660

In the "Message Inspectors" sample. This sample has code to let you inspect messages, and it also has a sample where they throw "raw" faults (i.e., "wrapped exceptions", aka "ExceptionDetail"). Why they stopped there is beyond me. With a bit of hacking (I must say, this SOAP stuff looks very complex), I got it to recognize any fault type, and throw a FaultException<T>. Well, not a FaultException<T>, 'cause there is a neutered FaultException class already in System.ServiceModel from Silverlight. So I named mine "SLFaultException" (SL for Simple Lame).

To get the fault thrown, you need to hookup the SilverlightFaultMessageInspector. From the demo code:
            EndpointAddress address = new EndpointAddress("http://127.0.0.1:52620/Service.svc");
            BasicHttpMessageInspectorBinding binding = new BasicHttpMessageInspectorBinding(new SilverlightFaultMessageInspector());
            ServiceClient proxy = new ServiceClient(binding, address);
            proxy.DoWorkCompleted += new System.EventHandler<System.ComponentModel.AsyncCompletedEventArgs>(proxy_DoWorkCompleted);
            proxy.DoWorkAsync();
You also need to register the assembly's types so that the fault throwing code can find the fault detail to deserialize. There's no way in Silverlight to get all the loaded assemblies (as far as I know). So we do this:
   System.ServiceModel.SilverlightFaultMessageInspector.RegisterCurrentAssembly();

This registers the calling assembly's types to be searched when a fault detail is received.

Now the only thing that needs to be done is fixup the HTTP 500 status code to a 200. The MS sample has a behaviour to do this. But that's annoying, having another DLL to deploy and deal with. So I do this in the Global.asax:
        protected void Application_EndRequest(object sender, EventArgs e) {
if (HttpContext.Current.Request.PhysicalPath.EndsWith(".svc", StringComparison.OrdinalIgnoreCase) &&
HttpContext.Current.Response.StatusCode == 500 &&
!HttpContext.Current.Request.Browser.Crawler &&
HttpContext.Current.Request.Browser.EcmaScriptVersion.Major > 0) {
// Set 200 if its a faulted service request
HttpContext.Current.Response.StatusCode = 200;
}
}
From there, go ahead and catch SLFaultException<T> just like you would a FaultException<T>.

Overall, this proved far easier than I expected. I'm at a complete loss why they didn't ship faults in SL2. I mean, if I hacked _something_ out in an hour or two, I'm sure someone who had some clue of what they're doing could have done so easier. Heck, they even have the real WCF source -- I was stuck using Reflector for parts :P.

SilverlightFaultsBinary.zip (13.81 KB)

SilverlightFaultsSource.zip (61.55 KB)
Code
Thursday, December 11, 2008 2:04:45 AM UTC  #    Comments [4]  |  Trackback

# Tuesday, December 02, 2008
Command line WCF Proxy Generation for Silverlight 2 RTM

I have a scenario where my web service app exposes several interfaces, and I want to use all of them from the Silverlight UI. The code generated by SvcUtil isn't completely compatible with Silverlight's WCF, meaning you need to do a lot of fixup to make it work in SL. Looking in the forums, I see mentions of "slsvcutil" and "slwsdl", but I've been unable to actually find these programs. The only thing that Silverlight has is "Microsoft.Silverlight.ServiceReference.dll", in the "Program Files\Microsoft SDKs\Silverlight\v2.0\Tools\ServiceReference" folder.

I opened that DLL up in ildasm, and it appears to just be some classes that work on a ServiceContractGenerator. Digging in deeper, yep, they just go through the generated CodeDOM and fixup stuff that isn't compatible. For instance, IExtensibleDataObject isn't supported. SL WCF also doesn't handle having Sync and Async methods (i.e, SomeOp and Begin/EndSomeOp) since they both have the same action. (You'll get an "An item with the same key has already been added" error.)

My main problems with the cutesy "Add Service Reference" UI are:
 - I can't figure out how to tell it to use mutliple WSDL files and share the data types.
 - It generates a ton of files and VS gets all obsessive over them for some reason.
 - It generates a lot of extra code (like the useless ClientBase stuff, and the idiotic/useless "event based async pattern" code). Extra code just clutters up Intellisense, so why bother?

[Add Service Reference is great for demos and perhaps simple one-offs... I can't imagine dealing with it for anything slightly complicated.]

So, I took the MSDN code sample for the ServiceContractGenerator, and hacked it up to do what I needed. Then I added the 2 lines to pass it through the Silverlight ServiceReference fixup thing, and presto - things look great. Usage is simple:

Usage: slsvchack <clr namespace> <outputfile> <wsdl1> .. <wsdlN>

The options are hard-coded to what I use and makes the most sense for Silverlight apps (in my opinion). I want to do a full fledged "SLSvcUtil.exe", but I think the right approach there is to dissassemble SvcUtil and patch in the Silverlight.ServiceReference stuff. But, I don't have the time right now.

Code: slsvchack.cs.txt (2.95 KB) [It's in C# 'cause the MSDN sample was, and my edits were minor. ]
Binary: slsvchack.exe (8.5 KB) [I'm too lazy to link in the Microsoft.Silverlight.ServiceReference.dll, so you'll need the Silverlight SDK installed.]
Code
Tuesday, December 02, 2008 7:56:09 PM UTC  #    Comments [1]  |  Trackback

# Saturday, November 29, 2008
Follow up on Obfuscation and VistaDB
Recently I mentioned how a lot of companies use obfuscation unnecessarily, and it ends up hurting legitimate customers while doing nothing to prevent "crackers". Specifically, I mentioned VistaDB, as the obfuscation tool was injecting invalid IL, causing Mono to reject the assembly.

Jason Short replied to me and I detailed the exact problems with the obfuscator (along with a few F# scripts to unobfuscate and remove the bad IL). They then released a new build with the obfuscation removed -- which Mono now happily loads.

I just wanted to give kudos to VistaDB for doing this. Not many companies are smart enough to realise that their "protection" tools are useless and do a 180 on such a stance.
Code | Security
Saturday, November 29, 2008 7:12:05 PM UTC  #    Comments [0]  |  Trackback

# Saturday, November 08, 2008
Software protection
I've been meaning to write about this for a while. It's a very simple topic, but developers get all emotional and stop being rational as soon as the magic "code protection" and "piracy" words get invoked. I'd like to say I'm not promoting copyright infringement nor saying developers don't deserve to be compensated for their work. Now that that's out of the way...

The two things most developers want to stop are unauthorized installing (license enforcement) and "code protection". Code protection is a very weak concept, mainly revolving around thinking people are gonna steal your precious algorithms. Protection is easy to deal with, so I'm going to cover that now.

Before VMs like .NET were popular, most of the code protection I've seen revolved around the code that implements the license enforcement. Developers would write all sorts of nasty-clever-clever code to make things hard for the crackers. You see this sometimes when you run an application and it complains about a debugger being installed or running. With Java and .NET, disassembly got easier. This made it extra easy to patch any license code, since the dissassembled code was in a high level language like IL. The response, and our first enemy of the day, was obfuscation.

Obfuscation takes your assembly and screws up all the metadata. On top of that, it might go and rewrite sections of your code to obfuscate the flow of the program, or perhaps indirectly load strings. The downside of course is that debugging gets really hard cause all your method names are now unreadable, reflection is broken, etc. Depending on the techniques an obfuscator uses, you can run into some other troubles. For instance, whatever obfuscator VistaDB uses is really broken, as it generates bad IL that just happens to work on MS CLR, but crashes (rightly so) on Mono. Not to mention that certain IL tricks are not verifiable, hence you can't use the code in lower-trust scenarios.

But what does obfuscation accomplish? Crackers ALWAYS win. Even the "most difficult" license system with hardware dongles and activation get cracked. The response I usually hear is "well it raises the bar". So. What. "Raising the bar" is totally pointless. Bruce Schneier talked about this.

For physical security, raising the bar is good in general. For example, if you buy a safe, it'll prevent a lot of thieves from getting to the valuables. Sure, there are higher level thieves, but you've weeded out a lot of the population around you, and the benefit is very real. Now some punk kids can't just go in and vandalize and "casually steal" your valuables.

But for computerized tech, the "bar" is the highest level attacker. If your valuable is "cracking my serial verification code", as soon as the "high level theif" cracks it, he can go write a simple program anyone can download. So the REAL bar is "user googles for a crack". That's what needs to sink past all the emotional nonsense developers go through when protecting their code. No matter what kind of complex protection schemes you put in, then obfuscate it on top of that, if the product has value, _someone_ will crack it, and all your users can just download the crack.

This isn't a maybe, this isn't a "possibly", this isn't theoretic, this is the exact reality. There is *nothing* you as a developer can do to prevent this (apart from make your product suck so much no one cares). [If there is, I'd love to hear it.]

So, obfuscation has zero value in preventing cracks, serials from getting out. And it has downsides. Just read the VistaDB blogs/forums to see real world problems only because they use an obfuscator.

What about "protecting special algorithms"? From who? If your competitors are good, they'll figure things out regardless. If they suck, they won't be able to do much with it anyways. I think the biggest threat is some overseas group disassembling your code, slapping their logos on it, and reselling it. That's a clear and obvious loss if they are making sales. But, obfuscation isn't really going to stop it, just raise the bar a tiny bit. In this case, since you're dealing with a limited number of "pirate companies" that exist for profit, perhaps obfuscating has a bit of value. But think: If someone can not know your source code, not be able to provide support, etc. etc., but can still outsell you and your marketing, perhaps you have business issues.

The one other place I hear people using obfuscation is to protect an app from "casual hacking". WTF does that mean? You mean you're afraid your sales clerk might decompile the PoS application, but give up quickly? You think it means you can safely store passwords in the binary? I'm not sure what such developers are thinking, but I'm guessing they did a poor security analysis of the situation.

As a side note, this is not particular to VM platforms like Java and .NET. Check out Hex Rays. They do a fine job *decompiling* optimized native code. I've seen it in action; it makes it easy to take any native app, decompile it, figure it out, then work with the assembly code. So these .NET devs thinking they are so leet cause Reflector messes up and hence no one can figure it out... sigh.

Finally, a nice real-live demo. Look at Spore and other games using heavy DRM and protection mechanisms. Obviously Eletronic Arts has an unlimited budget for getting the "best" type of protection. Yet the protection proved utterly useless against piracy. Just goto ThePirateBay.org and search. Yet they certainly introduced more bugs and user hate. (Of course, the REAL motive behind such DRM is killing the used games market. For this, all they need is stuff that honest users won't break.)

P.S. The reason I finally wrote all this is because VistaDB just took the silliness to the next level. I got their 3.4 Trial, but it crashes on Mono because the obfuscator emits totally invalid IL code. Their official response was that Trials arent tested on Mono. I bought the product and the "stable" builds still have the same busted IL code. Awesome protection; stopping paying users from using the software rocks!

I suppose I could understand IF they had some awesome trade secrets. BUT, they provide a source code license. So an evil VistaDB competitor just buys a source code license to get all the details. How is obfuscation helping ANYONE here? (Note the runtime has no licensing; only the developer install.)

Code | Security
Saturday, November 08, 2008 12:06:00 AM UTC  #    Comments [4]  |  Trackback

# Monday, November 03, 2008
Common Conceptual Issues with F#
Several times I've had to explain a few basic F# things to new users. I ran into some of these too when I first looked at F#. If you've been looking at F#, but some things still don't click, I hope this will help break the chains of C/imperative languages. I'm assuming you know a bit about F#, perhaps from the Quick Tour or the F# Tutorial file that comes with the Visual Studio integration.

Functions always take and receive exactly one argument.
Despite anything you see, in F#, there's only one argument, and only one return value. Let's look at how this plays out (I suggest firing up F# interactive and playing around):

> let inc x = x + 1;;
val inc : int -> int

> let add x y = x + y;;
val add : int -> int -> int

[I like to read -> as "to", because it's short and sweet.] So, for the case of "inc x", we see the signature is quite simple and expected "int to int". But for add? We see "int to int to int". What this actually means is "a function that takes an integer, and returns a function that takes an integer and returns an integer".

The same signature in C# would look like this: Func<int, <Func<int,int>>> or "Func<int,int> Add(int x)". [Some day, think how odd it is to have two ways of representing the function.] So when we call the F# function, we're really passing in the first argument, getting a new function, then applying the second argument. Something like this:

1. add x y
2. (add x) y
3. closure1 y
4. final-result

"closure1" is what we get from the "partial" application of add. I'll touch on this in a minute.

So how do we _really_ pass in two arguments at once? We put two arguments into one value, a tuple. We write it like this:

> let add (x, y) = x + y;;
val add : int * int -> int

Notice the new type signature: "int by int to int" or "int int tuple to int". In C#, this would be:

int add(Tuple<int,int> arguments) { return Tuple.A + Tuple.B; }

But F# nicely provides support for tuples [via pattern matching], so we can write them much more naturally. This syntax also works on the way out:

> let pow23 x = x*x, x*x*x;;
val pow23 : int -> int * int
// "int to int int tuple"

> let square, cube = pow23 7;;
val square : int
val cube : int

> square, cube;;
val it : int * int = (49, 343)


There's never no value
Another common typo/misunderstanding is when creating a function with no arguments:

> let sayHi = printfn "Hi";;
val sayHi : unit

Hi
> sayHi;;
val it : unit = ()

Oh, what happened here? Remember how _everything_ takes and returns exactly one value? The same is true of functions that "don't return a value", such as printf. Instead of "not returning a value" like C's void, functions return a special type called unit with one value, ().

Armed with this, let's see what the sayHi definition actually says. It says "let a value called sayHi equal to the result of printfn". Well, since the result of printfn is unit, sayHi becomes unit. The execution happens immediately, and subsequent uses of sayHi just get the value, unit.

To actually do what we want, we need to take an argument. Then F# knows we're a function value:

> let sayHi() = printfn "Hi";;
val sayHi : unit -> unit

> sayHi;;
val it : (unit -> unit) = <fun:clo@0>

> sayHi();;
Hi
val it : unit = ()


By explicitly taking a unit parameter, sayHi becomes a function (type unit to unit). Notice that if we just write "sayHi", we're just going to get the _value_ of it (a function), not apply (execute) the function.

Partial application
OK, so a side effect of this system of "take on param and return a function that takes the next", is that we can compose with just parts of a function. For instance, to write an increment function, we can do this:

> let add x y = x + y;;
val add : int -> int -> int

> let inc x = add x 1;;
val inc : int -> int


So far, nothing interesting. However, we can also write this more effectively:

> let inc = add 1;;
val inc : (int -> int)


Aha! What's happening here is what happens "secretly" each time we call add with 2 parameters. We're just using the intermediate result, the closure of "add 1", and assigning that function value to inc. In C#, it'd be something like this:
public static Func<int, int> Add(int x) {
    return y => x + y;
}
public static Func<int, int> Inc = Add(1);
// and "full application" of add looks like this: Add(1)(2)
[BTW, this isn't really currying in F#. Currying is taking a method of type "a' * 'b -> 'c" and turning it into "'a -> 'b -> 'c". Since F# methods are "automatically curried", there's no need for a "curry" step (well, except perhaps when using .NET methods, which are always "tupled", but that's another story).]

[Side note: a function like "add" is superfluous, because in F#, operators are functions:
> (+);;
val it : (int -> int -> int) = <fun:clo@18>
> let inc = (+) 1;;
val inc : (int -> int)
]

The pipeline


F# appears to have all this complicated syntax, with |> <| >> << and so on. But, these operators are defined in F# code, and follow some basic rules. They aren't magic or have any special compiler support.

The most important function operator is |>. A quick search of the F# source shows:
C:\Program Files\FSharp-1.9.6.2\source\fsharp\FSharp.Core\prim-types.fs(2062):       
        let inline (|>) x f = f x


(Operators are surrounded in parentheses to define them and to use them as functions with prefix notation.)
The type signature is:
> (|>);;
val it : ('a -> ('a -> 'b) -> 'b) = <fun:clo@23>

This is "alpha to a function alpha to beta to beta". That probably didn't help. Perhaps looking at the type of function application will help:

> let apply f x = f x;;
val apply : ('a -> 'b) -> 'a -> 'b


This demonstrates that a function application is really just taking a function "alpha to beta", giving it an alpha, and getting a beta. [I'm open to suggestions on better ways to pronounce type arguments.]

So, glance back up at the pipeline operator. We can see it's really just function application _in reverse_. What is the use of such a construct? If you've used C# 3.0's LINQ extension methods or a Unix shell, you probably already know. By reversing the function application, we can write things in a much more natural order. To modify something from the F# Quick Tour:

> let filterTypes name =
-   System.AppDomain.CurrentDomain.GetAssemblies()
-   |> Seq.map (fun a -> a.GetTypes()) |> Seq.concat
-   |> Seq.map (fun a -> a.Name)
-   |> Seq.filter (fun s-> s.Contains name);;

val filterTypes : string -> seq<string>

> filterTypes "Coll";;
val it : seq<string>
= seq ["ICollection"; "EvidenceCollection"; "GCCollectionMode"; "CollectionBase"; ...]
>


Now, this is a bit embarrassing, but this took me a long time to get. I stared at this and re-read the F# manual for probably an hour. How could something so simple do such "complex" stuff?? Once I finally got used to the idea of functions being normal values, and the whole "one arg one value" bit, it snapped together. The other operators (<|, >>, <<) are pretty easy to follow once the basics are understood (going through prim-types.fs is a great experience).

What else?
I hope this all helps fit some pieces together. Feel free to use the MSN thing on the side of my site to ask questions or give me suggestions. Thanks!

Edit: Also check out F# function types: fun with tuples and currying (From an F# team member's blog.)


Code | FSharp
Monday, November 03, 2008 9:07:15 PM UTC  #    Comments [3]  |  Trackback

# Monday, October 13, 2008
Mono or CLR in FreeSWITCH
mod_managed is now in the FreeSWITCH tree. This replaces mod_mono, and allows selection of either the Microsoft CLR or Mono 2.0+ as the runtime engine to use. All the interfaces, and most of the code, are identical for both versions. Modules written on Mono will work on the CLR version and vice versa.

Check it out.

Code | FreeSWITCH
Monday, October 13, 2008 10:43:10 PM UTC  #    Comments [0]  |  Trackback

# Wednesday, September 24, 2008
Empty check on IEnumerable without consuming
One common issue with IEnumerables is that you can't find out anything about them until you use them. A frequent scenario is wanting to know if the IEnumerable is empty before you go ahead and use it. For example, you may want to write a result set to a file, but only if there's actually data.

As far as I know, the .NET Framework has no built-in classes to facilitate this, so I hacked up my own. You use it by wrapping an IEnumerable inside an EmptyCheckEnumerable. Then, when you check the IsEmpty property, it gets the enumerator and calls MoveNext once. When you then go to consume it, it intercepts the MoveNext call and simply returns the previous value. From then on, it just passes through. The result is that you don't consume the IE twice, which can be necesary for performance or other reasons.

Example:
var res = new EmptyCheckEnumerable<object>(dc.Execute<object>("SELECT * FROM foo"));
if (!res.IsEmpty) {
  // allocate resources and consume the IE - will only execute the SELECT once
}
I suggest creating a "ToEmptyCheck" extension method to flow type information.
Code:

EmptyCheckEnumerable.cs.txt (2.14 KB)
Code
Wednesday, September 24, 2008 10:11:30 PM UTC  #    Comments [8]  |  Trackback

# Friday, September 19, 2008
Objects versus Closures - a koan
In a previous comment, someone mentioned the OO mindset ("mold" -- quite appropriate). I don't want to go into it much, but simply "quote for win" something from here http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg03277.html. It's a nice take on things and I got a kick out of it:

"
  The venerable master Qc Na was walking with his student, Anton.  Hoping to
prompt the master into a discussion, Anton said "Master, I have heard that
objects are a very good thing - is this true?"  Qc Na looked pityingly at
his student and replied, "Foolish pupil - objects are merely a poor man's
closures."

  Chastised, Anton took his leave from his master and returned to his cell,
intent on studying closures.  He carefully read the entire "Lambda: The
Ultimate..." series of papers and its cousins, and implemented a small
Scheme interpreter with a closure-based object system.  He learned much, and
looked forward to informing his master of his progress.

  On his next walk with Qc Na, Anton attempted to impress his master by
saying "Master, I have diligently studied the matter, and now understand
that objects are truly a poor man's closures."  Qc Na responded by hitting
Anton with his stick, saying "When will you learn? Closures are a poor man's
object."  At that moment, Anton became enlightened.

"
Code | FSharp | Humour
Friday, September 19, 2008 6:00:50 PM UTC  #    Comments [0]  |  Trackback

# Thursday, September 18, 2008
SQL 2008 Change Tracking with LINQ-to-SQL
I hacked up a little class to enable us to use SQL 2008's Change Tracking feature with LINQ-to-SQL. Change Tracking allows you to see which keys (and optionally columns) have changed in the database from a specific version. The SQL docs have a great overview with lots of examples and information.

Basically, we get a special CHANGETABLE function to SELECT from, which gives us the change information and keys. Additionally, there is the issue of versioning. Changes are only kept so long, so we want to make sure the last version we sync'd is still compatible, otherwise we have to re-initialize.

Finally, in order for our change SELECTs to be coherent, we need to snapshot the database. The easiest way to get this is by turning on Snapshot Isolation. Snapshot isolation allows us to read a virtual snapshot of the database. Any changes made from when we begin our transactions are not visible to us and we do not lock anything we read.

Here's an excerpt from a class I have to provide change tracking for our database:
public DbDataChangeProvider(long lastVersion) {
    this.lastVersion = lastVersion;
    this.txScope = ChangeTracking.GetSnapshotScope();
    
    var validV = ChangeTracking.GetValidVersionForAll(dataContext);
    baseline = lastVersion < validV;
    currentVersion = ChangeTracking.GetCurrentVersion(dataContext);
}
We take in the last version, then initialize a SnapshotScope. We get the minimum valid version and see if we're going to have to generate a baseline (re-init) or not. Next, we grab the current version of the database, so consumers can save the version for when they sync up next.

To get changed keys, you can do this:
ChangeTracking.GetChangedKeys<string>(dataContext, "Accounts", "AccountName", lastVersion, System.Data.Linq.ChangeAction.Delete);

This will give you an enumeration of all the Deleted keys; use other ChangeActions to get Insert or Updated. There's also a filter (SQL string) to limit further.

To get changed _items_, you can pass in a Queryable, like this:
ChangeTracking.GetChangedItems<Account>(dataContext, dataContext.Accounts.Where(a=>a.Balance>10), "Accounts", "AccountName", lastVersion);

The code should (seems to work for me) figure out your query and inject the JOIN to the CHANGETABLE function. The code is linked at the end of this article. Some of the functions use a Tuple type; if you don't have it, I've posted it elsewhere on this site. Or, you can delete those methods; they are only for 2-key tables.

ChangeTracking.cs.txt (9 KB)
Code | Misc. Technology
Thursday, September 18, 2008 2:03:46 AM UTC  #    Comments [0]  |  Trackback

# Wednesday, September 17, 2008
The learning curve is irrelevant
Something I've heard often is that "F# is too complex/functional programming is too hard". This is something that sorta came up in the comments here: http://www.atrevido.net/blog/2008/09/16/Why+NOT+F.aspx.

Why is this irrelevant? You only learn a language once. You pay the learning curve cost one time; after that, you have the techniques and power at your disposal. However, you pay the cost of code every time you read or write it. Since I'm going to read and write a lot more code than number of languages I learn, I'd much, much, prefer to pay this overhead once, up front, rather than in my code each time.

Of course, it's not necessarily this simple. It's possible to design a powerful language that renders code even more difficult. Regular expressions are perhaps a nice example of this; they're often called "write only" code. Every time I do a non-trivial regexp, I'm always going back to the reference. Another example are C macros -- text-based, they quickly let you get into trouble. The design of C# runs away from this and tries to make it very difficult to write code that is "hard" to figure out -- if there's any edge case where a feature might be confusing or not work, C# tends to not allow it at all.

F# design is different. F# doesn't try to shelter users - it gives you tools and lets you decide how to use them. It makes the assumption that if you're writing a program, *you have some clue of what you're doing*. F#'s tools are still safe (compared to say, C), but sure, you can go create a mess if you'd like.

Poor code quality is *not* something that should be fixed solely via technical measures. I liken it to using web filters to make sure "employees aren't goofing off on the Internet" -- this is a policy/management issue and should be solved via administrative means. If one of my devs is spending all day on 4chan but gets work done and adds value, what do I care? Similarly, if the code quality coming out is acceptable and the solutions work correctly, I don't care if it used macros, custom operators, "difficult" code, etc. The process to make sure code quality is high (code reviews) will take care of anyone abusing language features in stupid ways.

But in truth, F# isn't actually much more complex to use. To the beginner, what seems to be "unnecessary terseness" and a lot of complicated syntax is actually a very basic system in action. Many of the "built-in" F# features such as the |> pipeline operator are defined right in the language itself. There's no magic going on -- you can create your own functionality in exactly the same way. Once you understand the basic rules, you'll see that most everything else follows them.

But at any rate, why is "easy to learn" a benefit? Sure, it's handy to promote a language if people can pick it up easily, but it's not indicitive of long-term power. True, if you have a "web developer" who's going to add a few server-side scripts, it's nice that he doesn't have to learn much. But if you're developing an application of any substance, I fail to see how these help, given the negative effects of having a "simple" language.

P.S. On my site I'm not trying to infer that F# will take over the world or that C# will go away. I've met too many "professional developers" to realise that anything that requires thinking isn't going to achieve stellar adoption. I'm simply pointing out that the reasons come down to apathy and intelligence (with respect to the learning curve; there could be other reasons as well), regardless of how politically correct one phrases it.

Code | FSharp
Wednesday, September 17, 2008 8:49:06 PM UTC  #    Comments [6]  |  Trackback

# Tuesday, September 16, 2008
Why NOT F#?
This is actually an open request for comments. I'm honestly interested in hearing why F# is not always the better candidate versus C#. What can C# do well that F# cannot? In nearly everything, F# seems to come out on top, as far as I can see.

Let's get these out of the way:

   - Personal preference. Enough said.
   - In beta. Enough said.
   - Legacy code. Sure, if you have a project in C#, it may not make too much sense to switch mid-way.
   - Management. Enough said.
   - No benefit. This is simply lack of education and needs to be addressed separately.
   - F#'s too hard/it's hard to hire F# devs. This is a non-issue that is a separate topic. In summary, anyone worth hiring for C# work should be able to handle F#. [Exception being a very small deadline with an existing team...]

The only code reason I've seen is heavy native interop/pointer work. F# seems to be slightly more verbose than C# in this case. It's not much more, but I could see if you're doing just pointer code then it could get annoying. (Interestingly enough, F# COM interop is much nicer than C# because it supports named and optional parameters (http://blogs.msdn.com/dsyme/archive/2008/05/02/full-release-notes-for-f-1-9-4.aspx, search for "chart")...)

What other reasons are there for C# over F#?

Edit: Good point in the comments about C# being a standard with open source implementations. That could be a big issue for some. Another good point is the current lack of tool support (like ClickOnce, ASPX and WPF designers, etc.). I don't see any intrinsic reason F# wouldn't have those, except for limited resources.

Code | FSharp
Tuesday, September 16, 2008 5:06:28 AM UTC  #    Comments [17]  |  Trackback

MARS is case-sensitive for LINQ-to-SQL
When specifying a connection string to use with LINQ-to-SQL, make sure you "correctly" case MultipleActiveResultSets. If you have something like this: "Server=(local);Database=master;Integrated Security=true;App=dcdemo;MultipleActiveResultsets=true", LINQ-to-SQL will determine MARS is disabled even though it is not.

This is because inside the LINQ-to-SQL implementation, a normal String.Contains is performed on the connection string. Since this is an ordinal compare, it won't find MARS turned on if you spell it differently. Without MARS enabled, if you execute multiple queries on a single DataContext, it will force the outstanding queries to buffer. This means that instead of a nice lazy reading from the SQL server, you'll end up bringing it all into local memory.

More info here: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=366444

Code
Tuesday, September 16, 2008 5:02:00 AM UTC  #    Comments [0]  |  Trackback

# Tuesday, September 02, 2008
Become a better F# programmer via Real World Haskell

There are other great books out there such as Expert F#. The F# Dev center has links to many other "learn F#" articles. All of these are great.

But, something I found helpful is going "purely functional", and Haskell is the perfect vehicle. When you're forced to think only functional, and don't have the other "escapes" F# has, you bend your mind into understanding how you can accomplish things without using mutation or object-orientation.

The downside of Haskell is that many resources seem to be very challenging to get into. There's no doubt that the learning curve for Haskell can be tough. On top of that, many materials tend to dive right into monads and it tends to end up too scary. I've even bought several other good books on functional programming, but none of them were easily approachable. (They have good content, but you can't start from zero by using them.)

Enter Real World Haskell. This is a *very* easy to follow book and really drives functional programming home. It doesn’t assume you know anything about functional programming at all, so the learning curve is a gentle slope.

Even better? It's completely available online, so you can start reading it right now! Plus, it has reader-submitted comments which are of tremendous use, as they ask and answer many common questions that might arise as you read along, without interfering with the flow. You can read the entire book here: http://book.realworldhaskell.org/read/ [But buy it to support the excellent work the authors have done!]

I've found that my F# skills have gone up tremendously by reading Real World Haskell. For instance, I "sorta" understood F#'s computation expressions and builder. Say, enough to use them -- that's easy, like most things in F#), but understanding the concepts behind them? Starting to learn Haskell really brings the understanding around. This isn't to say that you'll eschew mutation and OO in F# -- such concepts can be very useful (and increase performance on the CLR). But at least you'll know when a more elegant solution is available.

(Plus, it's fun! As someone in #haskell on freenode put it to me: "Learning Haskell will f*ck with your brain, and you'll like it.")

Code | FSharp
Tuesday, September 02, 2008 4:00:06 AM UTC  #    Comments [4]  |  Trackback

# Friday, August 29, 2008
No, really, the Event-based Asynchronous Pattern IS bad
Came across this:
http://michaelcurbanski.com/log/2008/08/29/functional-programmers-hate-events/

His main point is that async total sucks, but that "Events don’t necessarily make async code tragically unreadable!" As a demonstration, he shows some simple async HTTP code without full or exception continuations. His code just enforces my point.

At any rate, yes, sure, if your continuation code isn't too related to your calling code, events can work. I mentioned this in my post: "events are a bad choice for code that is not loosely coupled" followed with "sometimes a simple delegate field would be a better choice". Indeed, looking at Mike's sample code, it seems as if using a simple delegate field would allow some nice refactoring.

But a point of the Event-based Asynchronous Pattern is, and I quote, to "Wait for resources to become available without stopping ("hanging") your application". So basically, you want to take an existing method, say, in response to a user action, but you can't let it block. This is exactly perfect for a continuation based approach and not so much for a loosely-coupled event-based approach. The act of "firing" an event should indicate that you're letting _other listeners_ (notice the plural) know when you did something.

More on when events are appropriate: Say you're listening for SNMP messages and when you receive one, you let "everyone" know that you did. You don't care what they do with the result, and they just go off on their own. Events can work. Or, take, for example, the BackgroundWorker. The BackgroundWorker doesn't help thread your blocking code, it just pushes the block to a background thread you don't care about. Your HTTP code will still be all sync, and you'll still burn a thread. BackgroundWorker main help is that it coordinates back to your "UI" thread, since many UI frameworks have strong thread affinity and will crash otherwise.

But that was the whole point
The event async pattern, specifically in things like Silverlight, is completely aimed at *not letting the thread block*. That's the _only_ problem it is trying to solve. Silverlight only forces this because the main thread is the UI thread, and they won't want the browser hanging by dumb apps that don't put the block on a background thread. If they didn't want to do continuations, they could have at least kept the sync APIs, and thrown an exception if called from the main thread (thus letting people who know what they're doing not have to deal with the ugly event syntax).

BTW, F# kicks the crap out of your language
I didn't mention it in my previous event post, but F#'s computation expressions let you deal with continuations in a totally sexy way. H.H. Don Syme, writes about it here: Introducing F# Asynchronous Workflows. You learn F# if you don't know it. But just to demonstrate, here's an example from that introduction:

    let AsyncHttp(url:string) =

        async {  // Create the web request object

                 let req = WebRequest.Create(url)

                

                 // Get the response, asynchronously

                 let! rsp = req.GetResponseAsync()

                

                 // Grab the response stream and a reader. Clean up when we're done

                 use stream = rsp.GetResponseStream()

                 use reader = new System.IO.StreamReader(stream)

 

                 // synchronous read-to-end

                 return reader.ReadToEnd() }

The let! binding handles things asynchronously; the rest of the body becomes a continuation. If that isn't superior to any C# approach and doesn't make async easy, I don't know what would.

Code | FSharp
Friday, August 29, 2008 9:03:55 PM UTC  #    Comments [0]  |  Trackback

# Thursday, August 28, 2008
Tsk, tsk, Silverlight - Events are not async's friend
OK, so Silverlight 2 is still in Beta 2 and hopefully will have time to change. But I would have thought this fix would have gotten into Beta 2 (as far as I can tell, it has not). At any rate, it applies to async design in general.

Scenario: You are writing an async method and need to call code when your async process finishes. How do you expose this to your caller? For some reason (I'd love to know what it is), it seems to be getting more popular to expose an event to accomplish this.

Events suck: Let's use a simple case. You want to get two URLs, using the return of one to get the other, then print the result to the console. You have a few local variables you need to use as well. Let's see what it'd look like without async:

// Inside some method:
var url1 = Console.ReadLine();
var url2 = Console.ReadLine();
var someData = Console.ReadLine();
// onEx = some exception handler
try {
    var res1 = webThingy.Download(url1);
    var res2 = webThingy.Download(url2 + "?data=" + res1.Data);
    Console.WriteLine(someData + res2);
} catch (Exception ex) {
    onEx(ex);
}

Straightforward, eh? Now, with async + events? You _could_ go create a new object type and add all sorts of fields and methods and whatnot, but that's a lot of work and gets ugly quickly. Closures are a natural help here. So how does the event-based async code look?

// Inside some method:
var url1 = Console.ReadLine();
var url2 = Console.ReadLine();
var someData = Console.ReadLine();
// onEx = some exception handler
OnDownloadCompleteEventHandler first;
webThingy.OnDownloadComplete += first = (o, res1) => {
    if (res1.Exception != null) {
        onEx(res1.Exception);
        return;
    }
    webThingy.OnDownloadComplete -= first;
    webThingy.OnDownloadComplete += (o2, res2) => {
        if (res2.Exception != null) {
            onEx(res2.Exception);
            return;
        }
        try {
            Console.WriteLine(someData + res2.Data);
        } catch (Exception ex) { onEx(ex); }
    }
    try {
        webThingy.DownloadAsync(url2 + "?data=" + res1.Data);
    } catch (Exception ex) { onEx(ex); }
};
try {
    webThingy.DownloadAsync(url1);
} catch (Exception ex) { onEx(ex); }


I think this judges itself.

Better
The way it SHOULD be is that any async method should take two arguments, one to call for result, one for exception. Let's see how that would look:

// Inside some method:
var url1 = Console.ReadLine();
var url2 = Console.ReadLine();
var someData = Console.ReadLine();
// onEx = some exception handler
webThingy.DownloadAsync(url1, onEx, res1 => {
    webThingy.DownloadAsync(url2 + ?data=" + res1, onEx, res2 => {
        Console.WriteLine(someData + res2);
    });
});

This code isn't perfect, but it's sure a ton better than the event-based system. With a bit extra work, you could build a simple async framework. Every Async method could return an Async object that would allow you to consolidate stuff like exception handling and cancellation. But even without that, this code straight away is much superior.

The extra downside
Interestingly enough, .NET 1 had the concept of BeginXXX/EndXXX, but because there were no closures, it was always a bit more of a pain to implement. BeginXXX/EndXXX, while not making exception handling as easy, are at least a good start.

The cool thing about BeginXXX/EndXXX was that you could refactor them generically into nice async syntax. Heres a quick and dirty example [F#'s async stuff works similarly.]:

static Action<A1, Action<Exception>, Action<R>> ToAsync<A1, R>(
    Func<A1, AsyncCallback, object, IAsyncResult> begin,
    Func<IAsyncResult, R> end) {
    return (arg, onEx, cont) => {
        begin(arg, iar => {
            try { cont(end(iar)); } catch (Exception ex) { onEx(ex); }
        }, null);
    };
}
static void BeginAcceptAsync(this Socket s, int timeout,
    Action<Exception> onEx, Action<Socket> cont) {
    ToAsync<int, Socket>(s.BeginAccept, s.EndAccept)(timeout, onEx, cont);
}
static void Main() {
    Socket s = ...;
    s.BeginAcceptAsync(10,
        ex => Console.WriteLine(ex),
        sock => sock.Close());
}

(Yes, I think this is a legitimate use of type extensions, but again, only because the original library had a design flaw. And even the non-extension syntax wouldn't necesarily be bad.)

But, to my knowledge, since there's no way to reference an event, this is not possible with the async + event approach t hat is becoming all the rage (yes, SocketAsyncEventArgs, I'm talking about you too). If you know of a way to ease the pain of async+events, tell me.

In summary
It seems that events are a bad choice for code that is not loosely coupled, such as UIs. Even when loosely coupled, sometimes a simple delegate field would be a better choice, since you can compose (like tacking on a filter or otherwise augmenting the callback). But in the case of async, I cannot see how it is good.

Code
Thursday, August 28, 2008 1:48:31 AM UTC  #    Comments [1]  |  Trackback

Why are there still delegates?
Can someone explain to me the point of having delegate types in C#/.NET? They made sense in the non-generic .NET 1 world, but with generics they should have no reason to exist (except maybe purely as a type alias). All the delegates in common use are representable with a generic definition (think Action/Func<...>). But for some reason, you cannot automatically use an equivalent delegate in C#.

This manifests a bit more when you consider a local such as:

var inc = (int x) => i + 1;

As Expression<T> syntax takes the same form, the compiler cannot tell if it's an expression or a delegate. Additionally, since Func<int,int> is "just as good" as any other generic type, even if the compiler wanted to create a delegate, it wouldn't know to use Func<int,int>. Thus, you must specify it, and inner functions are too clunky to be used in C#.

The answer I got from Microsoft (not an official answer, just a response to a Connect item) was that C# might allow auto-conversion in version 4, but also that delegates aren't totally useless because recursive delegate types can't be declared in that fashion. But for the relatively few uses (fixed-point combinators?) of such types, built-in, specific support would be a small sacrifice, wouldn't it?

This is just something I've been wondering about.

Update: Thanks for Jonathan Pryor for pointing out in the comments what my limited functional mind missed: ref/out parameters and of course, and high-arity delegates. While these should be easy to work around, they are accurate technical reasons. Thanks.

Code
Thursday, August 28, 2008 12:18:44 AM UTC  #    Comments [11]  |  Trackback

# Wednesday, August 27, 2008
ASP.NET MVC Begs for Tuples

[Yea, I’m not a web dev, and actively avoid it as much as possible, so I’m late to this party.]

The interesting thing about ASP.NET MVC is that it takes an opposite approach to ASP.NET in general. ASP.NET concepts try to “build up”, so as to shelter us from the evolved idiocy that is HTML, as well as clean up the inherently stateless nature of HTTP. ASP.NET MVC makes no attempt and forces you to deal with reality. Considering ASP.NET’s abstraction isn’t really perfect (example: databinding sucks), MVC’s approach is unfortunately refreshing.

Because of its “raw” nature, you’ll be writing a lot more HTML than you’d do with ASP.NET, and this HTML must line up with code on your server. To make this less of a pain, there are some HTML helper functions. Rob “ type inference” Conery has an overview here.

Here’s the signature for one of the functions:

    public static string CheckBox(this HtmlHelper helper, string htmlName, string text, string value, bool isChecked, object htmlAttributes);

The last parameter confused me. Why would it be an object? Am I supposed to pass in an IDictionary<string,string>? Just a long string? To make it more confusing, other helpers had two overloads:

    public static string TextArea(this HtmlHelper helper, string htmlName, object value, IDictionary<string, object> htmlAttributes);

    public static string TextArea(this HtmlHelper helper, string htmlName, object value, object htmlAttributes);

OK, so they explicitly called out the IDictionary there – THEN WHO WAS OBJECT HTMLATTRIBUTES?

Rob covers in his overview. The idea is that you’re supposed to use anonymous types to hack around the lack of tuples.

<%=Html.Whatever(arg1, bla, …,
                             new { @class=”cssx”, style=”x:f” …}) %>

What a great case of not-having-built-in-tuples-is-lame. It’s so lame, Microsoft’s own developer teams have to resort to weird (but quite creative!) hacks like this so that their syntax won’t completely suck*. Damn.

And now, a duck

For bonus points, there is another C# compiler feature that the MVC team could have [ab]used, and it would arguably have made more sense (although the syntax isn’t as tight). C# supports duck typing on collection initializers! So, they could create a class like this:

    public class HtmlAttributes : System.Collections.IEnumerable

    {

        public System.Collections.IEnumerator GetEnumerator() { ... }

        public void Add(string name, object val) { ... }

    }

And then they can write this:

new HtmlAttributes { { "A", 123} , {"B", "test"} }

No, it’s not as tight as the anonymous type syntax, but it does make a lot more sense.  And tuples still make much more sense than either approach, and have benefits for the rest of the language to boot (death to out parameters!).

 *The only benefit I see in anonymous types is that, at compile time, you'll  know there are no key conflicts - but that is totally trivial in the way they use them, since all the keys are declared right there.

ASP.NET | Code
Wednesday, August 27, 2008 1:12:13 AM UTC  #    Comments [10]  |  Trackback

Extension Methods Suck
Been meaning to write this for a while, and I think I touched on it here, but I'd like to expand a bit. C# Extension Methods are just a hack to compensate for C#'s poor handling of functions in general. As far as I can tell, they were added solely so you can do this type of thing in LINQ:

  var squares = myInts.Select(i => i * i)

Instead of:

  var squares = Enumerable.Select(myInts, i => i * i)

In other words, they wanted to provide some simple infix syntax for functions. Well, that's a poor approach to the problem.

    - First off, it only works on function specially declared to be "extensions", which means your composition options are limited to whatever the library has built in. I can't send arguments to arbitrary static methods, like, say, "someVar.Console.WriteLine".

    - Second, since extension methods are defined solely by their method name, ambiguity is quite easy to come across. There's no way to qualify Foo.Function versus Bar.Function.

Pipeline                                                       
Other languages approach this with two simple things (which actually simplify the entire language/type system overall). First off, we need to be able to define function operators. I'll demonstrate with F#'s pipeline operator:

    let (|>) x f = f x

[As a side note, any language that lets you toss around operators and functions will allow this kind of syntax – it isn’t that F# had to have compiler support for this particular operator.]

In fake C#, it’d be something like (for fun, notice the lack of type inference):

B operator|> <A, B>(A x, Func<A, B> f) { return f(x); }

This means that the |> operator will take x on the left and apply it to f on the right side. If this existed in C#, you'd be able to write something like:

    "Hello" |> Console.WriteLine
or
    if (myInts |> Enumerable.Any) { .... }

Now we can pass in a parameter to static methods. But, hey, whaddya know? With this, Extension Methods are solved for all single-argument static methods! That was easy. But what about Select – it takes two parameters, so this won’t work.

Enter the Lambda
What if ALL functions took one parameter and output one parameter? If that were the case, then we’d be set. But how do allow more than one parameter? Well, what if, every time you declared a method with more than one parameter, it actually returned a method that took the next parameter? For example, we could write “Add” as:

        Func<int,int> Add(Func<int,int> a) { return b =>  a + b; }

This is known as the curried form of Add. We’d now call it as: Add(5)(6). We can do cute stuff like “var inc = Add(1)”. But, as the Add declaration shows, in C# this is too unwieldy (and this is a simple example!). The compiler should actually do all this for us, so we can just write our functions normally but use them as if they were written in curried form.

Now, if we simply swap the order of arguments for Enumerable.Select, we have our extension method ready to roll: Enumerable.Select(Func, IEnumerable) can be used so:

var squares = myInts |> Enumerable.Select(i => i * i)

Now the call to Select takes the lambda (i * i) and returns a function that takes an Enumerable. It is then given the myInts, and everyone is happy. This is just a quick, crap, explanation. Google can lead you to many more interesting resources about partial application, currying and so on.

At any rate, I think if C# had taken this approach, we'd all be much better off. To top it off, more functions would take their arguments in a proper style. As it is, uncurried versions of Extension Methods are incompatible with normal function pipelining. Oh well.


Code | FSharp
Wednesday, August 27, 2008 12:17:53 AM UTC  #    Comments [15]  |  Trackback

# Tuesday, August 26, 2008
ASP.NET MVC - Abusing Using (At VB's request?)
Someone on our team started using ASP.NET MVC for a new web interface we're doing. I must say I'm impressed with the level that the MVC team [ab]uses the C# compiler, mostly in a good way. On the plus side, they end up with a bit more compiler time checking than would be possible otherwise (we can only hope WPF will follow suit some day).

But one thing struck me odd was how their helper method for generating an HTML form works. The goal is to generate an HTML form tag with the right action, and they use lambdas as symbolic references to figure out the action. The next problem is making sure that the <form> gets closed with a </form>. The straightforward answer to this is "create a function that takes a function". The C# signature would be: void Form<A>(Expression<Action<A>>, Action). Then your ASPX code would be:

<% Html.Form<FooController>(x => x.Edit(someVar.FieldX), () => { %>
    Some Html <% SomeCode%>
<% }); %>

The code is nicely bracketed and works fine. But ASP.NET MVC doesn't actually do that. Instead, the Form method returns an IDisposable! The code to use it is:

<% using (Html.Form<FooController>(x => x.Edit(someVar.FieldX)) { %>
    Some Html <% SomeCode%>
<% } %>

Why do they use an IDisposable? The rest of the MVC framework seems to assumes people are somewhat familiar with lambdas, closures and what not. The only thing I can think of is that VB doesn't support anonymous methods. So in order to make it VB friendly, they come up with quite a strange use of IDisposable to abuse language support for it. Overall, I'm not sure if this is dumb or cute.

Code | ASP.NET
Tuesday, August 26, 2008 9:19:59 PM UTC  #    Comments [4]  |  Trackback

# Saturday, August 16, 2008
Reference cells in F# sequences
I saw this stack implementation (http://blogs.msdn.com/jaredpar/archive/2008/08/15/immutablestack-in-f.aspx) and wanted to see how I'd approach it:
 
#light
open System

type 'a ImmutableStack =
   | Empty
   | Stack
of 'a * 'a ImmutableStack
  
member x.Peek() = match x with | Stack (v, _) -> Some v | _ -> None
  
member x.Pop() = match x with | Stack (_, s) -> s | _ -> failwith "empty stack"
   member x.Push a = Stack(a, x)
  
member x.IsEmpty = x = Empty
 
(Well, I'd probably just use a list in most cases.) But interestingly, he has an All function, which returns a sequence of the entire stack. Chris suggested using sequence expressions and then recursively calling the function:
 
member x.All =
    match data with
        | Empty
-> Seq.empty
        | Value (v,n)
->
            seq {
               
yield v
               
yield! n.All }
 
This is pretty nice as far as syntax goes. For some reason, I wanted to see how it'd look without the recursion (which needs to generate a new enumerator, unless the compiler is doing some wicked awesome stuff, which wouldn't surprise me). Here are some of the things I came up with:
 
member x.All1 = 
  
let until f = Seq.generate (fun () -> ref x) f (fun _ -> ())
   until (
fun cur -> match (!cur) with 
                        | Stack(v, s)
-> cur := s
                                         Some v 
                        | _
-> None)
 
Seq.generate continues calling the function until None is returned, so this provides the loop termination. Another similar approach:
 
member x.All2 =
      
let until f = Seq.generate (fun () -> ref x) f (fun _ -> ())
      until (
fun cur -> try (!cur).Peek()
                        
finally if not (!cur).IsEmpty then cur := (!cur).Pop())

Since Peek already returns 'a option, we can use it directly. The problem is that we need to update cur to cur.Pop
and then return a value. The try/finally works, but the whole deal still doesn't seem very elegant. Sequence expressions allow us to yield:
 
member x.All3 =
      
let cur = ref x
      {
while not (!cur).IsEmpty do
          yield (!cur).Peek()
          
do cur := (!cur).Pop() }
      |> Seq.choose(
fun x -> x)

I dislike this one. Because we're not doing the pattern match ourselves, we end up yielding 'a option. But None is never valid because we guard on IsEmpty. This makes us stick a Seq.choose on with an identity function to strip off the Some. [Side note, is there no built-in identity function?] Bringing the match into the seq fixes the issue, but the code is pretty long:

member x.All4 =
      
let cur = ref x
      {
while not (!cur).IsEmpty do
          match (!cur).Peek() with
                      | None
-> do ()
                      | Some v
-> do cur := (!cur).Pop()
                                  
yield v }

 I think what bothers me the most here is that the while and match are redundant. The None case will never be matches because we guard on IsEmpty. Moving that into a separate function gives:

member x.All5 =
      let v = function Stack(vl, _) -> vl | _ -> failwith "dont call on empty"
      let cur = ref x
      {
while (!cur) <> Empty do
          yield v (!cur)
          
do cur := (!cur).Pop() }

I prefer All5 to All4, but still think All1 or All2 are nicer. (Chris's original is best as far as I can tell.) How else can this be done?

Edit: I lost sight of one of the principals of functional programming: Composition. Here's a simple update to All5 that, IMHO, vastly improves it:

member x.All6 =
      
let cur = ref x
      
let toVal = function Stack(v, _) -> v | _ -> failwith "dont call on empty"
      let next () = try toVal (!cur) finally cur := (!cur).Pop()
      {
while (!cur) <> Empty do yield next () }
 

Every line builds up and you don't have to keep its details "active" in your mind. This makes the sequence expression (which is the driver of the algorithm) easy to verify. And again, to clarify, in a real system I'd probably use what Chris suggested since it's the nicest syntax. The other versions are only seeing what it looks like if we toss recursion and use a reference cell.

Code | FSharp
Saturday, August 16, 2008 4:09:49 AM UTC  #    Comments [0]  |  Trackback

# Friday, August 15, 2008
Typing string IDs

I just read these two posts:
http://blogs.msdn.com/simonince/archive/2008/08/15/strongly-typed-primitives.aspx
http://www.thejoyofcode.com/Avoiding_Primitive_Obsession_to_tip_developers_into_the_pit_of_success.aspx

And that reminded me about something we recently did. One system we're working on uses a lot of string identifiers for many different types of objects. There are many, many of these stored and passed around, so keeping things efficient was of high concern. 

The downside of string IDs (really, using any common type as an ID) is that it's legal to pass any primitive of the same type. Strings and integers abound, both as IDs of other classes, as well as general use. So it's not unimaginable that someone could pass the wrong parameter some where. This could lead to runtime crashes or unexpected results (if the ID is actually a real record of another class of object). Finally, using common types for IDs reduces usability. The signature "public void Delete(int id)" leaves a lot to be desired.

We wanted to hit all these issues, in addition to keeping things simple. There are times when untyped data needs to be converted, and this should be easy and clear. We wanted to avoid having to define new types when we had new classes of objects to identify It is also customer-visible code, so C# is used.

Using a reference type was unacceptable, because it'd add at least 12 bytes overhead (I think more on x64). Using a struct fixes this, in addition to dealing with silly nullability issues. [If a type can be null, it should always be explicit. C#'s "references types can be null" makes this hard.]

The end result was quite simple. Wrap a string in a structure so equality and hashing pass through. But, take advantage and remove case/cultural sensitivity (since in many systems, data IDs are not case sensitive). Provide explicit conversions so you can easily convert to and from strings, but never by accident. (If the conversions were implicit, you're back in the starting point.) Finally, add a generic parameter that is never used. The generic parameter gives you distinct types without having to define them. Now the APIs can look like:

    public void Delete(Id<Product> id)...

    Dictionary<Id<Group>, List<Id<User>>> members...

When you do have hardcoded IDs, as the blog entries I mentioned do, you can convert easily: (Id<User>)"Admin". Nulls are treated as empty, all the time (empty may be a valid value anyways).

When a truly optional ID is needed, use nullable types: "Id<Whatever>?". This fully captures how values are handled. This is vastly better than "It's a reference type, so maybe null is allowed. Or maybe null will crash. Empty string might be considered null, or maybe empty string means optional." With explicit nullability, the type system says it all.

The best part is that there should be pretty much no overhead. I'd expect the equality functions to be inlined, and there's no memory overhead, since the struct is simply a string reference.


Here's the class.:
public struct Id<T> : IEquatable<Id<T>>
{
    public Id(string name) {
        this.name = name ?? "";
    }
    readonly string name;
    
    public static explicit operator string(Id<T> x) { return x.name ?? ""; }
    public static explicit operator Id<T>(string s) { return new Id<T>(s); }

    public override bool Equals(object obj) {
        return
            !(obj is Id<T>) ? false :
            ((Id<T>)obj) == this;
    }

    public bool Equals(Id<T> other) {
        return other == this;
    }

    public override int GetHashCode() {
        return (name ?? "").GetHashCode();
    }
    public override string ToString() {
        return name ?? "";
    }
    public static bool operator ==(Id<T> a, Id<T> b) {
        return StringComparer.InvariantCultureIgnoreCase.Compare(a.name, b.name) == 0;
    }
    public static bool operator !=(Id<T> a, Id<T> b) {
        return !(a == b);
    }
}
I'd be interested in seeing a more generic, yet very simple, solution: one that doesn't rely on the underlying type to be string, but still provides all the same functionality. I don't think it's possible, since there's no way to get a generic constraint that'd allow similar handling of "string" and "int?". Additionally, structs can't inherit, so you'd end up using "Id<string, Product>" everywhere, which is far from elegant.
Code
Friday, August 15, 2008 7:39:11 PM UTC  #    Comments [8]  |  Trackback

# Thursday, June 26, 2008
I underestimated the power of query comprehensions

In my last post, I said I'd write the sample in C# to compare to F#. Well, I grossly underestimated the power of query comprehensions. The C# version is almost the same length (formatting differences, mainly). I'm surprised and impressed. (Or maybe I'm writing F# like I'd write C#.) Edit: I think maybe sequence expressions could cut it down a bit...

...But... C# still can't do discriminated unions efficiently or effectively ;).


    1 // crudcreatecompare.cs: Generates LINQ CRUD table fields using the horribly named DatabaseBase code

    2 //

    3 // Tables look like: [Table(Name="dbo.Accounts")]

    4 // Columns look like this:

    5 //  [Column(Storage="_AccountName", DbType="VarChar(128) NOT NULL", CanBeNull=false, IsPrimaryKey=true)]

    6 //  [DataMember(Order=1)] // Exists if serialization is turned on; used to order key parameters

    7 // Emits:

    8 //  public static readonly TableHelper<Account, String> Accounts =

    9 //      CreateTable(dc => dc.Accounts, a => a.AccountName);

   10 

   11 using System;

   12 using System.Collections.Generic;

   13 using System.Data.Linq.Mapping;

   14 using System.IO;

   15 using System.Linq;

   16 using System.Reflection;

   17 using System.Runtime.Serialization;

   18 

   19 class Program

   20 {

   21     static void Main()

   22     {

   23         Console.WriteLine(

   24             new Program().generate(

   25             "C:\\yourlinq.dll"));

   26     }

   27 

   28     string generate(string asmpath)

   29     {

   30         var asm = Assembly.LoadFrom(asmpath);

   31         var lines = from t in asm.GetExportedTypes()

   32                     let ta = getAttr<TableAttribute>(t)

   33                     where ta != null

   34                     let name = pluralize(ta.Name.Replace("dbo.", ""))

   35                     orderby name

   36                     select genTable(t, name);

   37         return joinStrings("", lines);

   38     }

   39 

   40     string genTable(Type t, string tableName)

   41     {

   42         var keyProps =

   43             from p in t.GetProperties()

   44             let c = getAttr<ColumnAttribute>(p)

   45             where c != null && c.IsPrimaryKey

   46             let dm = getAttr<DataMemberAttribute>(p)

   47             orderby dm == null ? 0 : dm.Order

   48             select p;

   49         var tw = new StringWriter();

   50         tw.WriteLine("public static readonly TableHelper<{0}, {1}> {2} =",

   51             t.Name,

   52             joinStrings(", ", keyProps.Select(p => p.PropertyType.Name)),

   53             tableName);

   54         tw.WriteLine("\tCreateTable(dc => dc.{0}, {1});",

   55             tableName,

   56             joinStrings(", ", keyProps.Select(p => "a => a." + p.Name)));

   57         tw.WriteLine();

   58         return tw.ToString();

   59     }

   60 

   61     string joinStrings(string sep, IEnumerable<string> items)

   62     {

   63         return string.Join(sep, items.ToArray());

   64     }

   65     string pluralize(string s)

   66     {

   67         return s.EndsWith("s") ? s : s + "s";

   68     }

   69 

   70     T getAttr<T>(ICustomAttributeProvider icap)

   71     {

   72         var a = icap.GetCustomAttributes(typeof(T), true);

   73         return a.Length == 0 ? default(T) : (T)a[0];

   74     }

   75 }

Code | FSharp
Thursday, June 26, 2008 6:15:10 AM UTC  #    Comments [0]  |  Trackback

# Monday, March 31, 2008
The reason VB.NET is truly a second class citizen

No support for anonymous methods/lambda statements. I first realised this when someone commented on one of my functional C# postings, asking "how can I write this in VB"? Turns out VB has no support for anonymous methods. Ouch.

This lack of capability rules out a so much functionality, I honestly cannot imagine writing anything significant today using VB.NET. But with LINQ and F# making such a big splashes, I have high hopes for the future of functional programming on the .NET languages, and I'm sure VB.NET will get itself fixed up.

Really, dropping anonymous methods in favour of stuff like XML literals? Maybe that's why it's listed as first time or casual programming ;). (No, I jest. Scheme is far better for first time programming.)

Code
Monday, March 31, 2008 9:55:56 PM UTC  #    Comments [8]  |  Trackback

Empty WCF services references - no code generated

Usually I use WCF just by referencing the contracts directly and using my own generic helper methods. Unfortunately, I had to make use of the built-in async patterns that the codegen can provide. Hence, I ended up using the "Add Service Reference" dialog, which notices (unlike svcutil on the command line) that since I'm requesting async, it'll need to generate my interfaces even though they're already referenced. Maybe in .NET 4 we'll have a new async pattern that'll leverage Expression Trees to rid us of having to use the rather ugly FooAsync/BeingFoo patterns.

Things were working just fine until I added a reference to a new service. Adding the service reference didn't work; it failed silently. It added all the schema files to the project, but the Reference.cs file was empty. If I told it to NOT used referenced assemblies, then it generated everything just fine.

After switching to the command line and using svcutil, I got these errors:
Error: Cannot import wsdl:portType
Detail: An exception was thrown while running a WSDL import extension: System.ServiceModel.Description.DataContractSerializerMessageContractImporter
Error: Referenced type 'MyApp.Management.CoolFunction.CoolThingRule, MyApp.Management.Contracts, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' with data contract name 'CoolThingRule' in namespace 'http://schemas.MyApp.com/management/v2/00' cannot be used since it does not match imported DataContract. Need to exclude this type from referenced types.XPath to Error Source: //wsdl:definitions[@targetNamespace='http://schemas.MyApp.com/management/v2/00']/wsdl:portType[@name='ICoolThingManagement']

I regenerated without the referenced code and figured out that the problem was that one type referenced an enum which had some values without the [EnumMember] attribute (although it had a DataContractAttribute on the enum itself). Because of this, WCF wanted to represent the enum with a string and thus was incompatible. Adding the EnumMemberAttribute to all the values sorted out this issue.

Code
Monday, March 31, 2008 9:06:51 AM UTC  #    Comments [5]  |  Trackback

# Thursday, March 27, 2008
Hacking SOAP Faults into Silverlight 2 Beta 1

Silverlight 2 is pretty nice. Compared to dealing with the nightmare of HTML, CSS, AJAX and whatever else, it's quite divine. Of particular interest is that Silverlight 2 has a mini Windows Communication Foundation stack that can do basic SOAP work (and some "Web 2.0" type things too I think). While the SL WCF stack works pretty well overall, it doesn't support SOAP faults.

First off, SOAP faults usually cause the server to return an HTTP 500. For whatever reasons, this isn't handled correctly between the browser and Silverlight, and results in an UnexpectedHttpResponseCode. OK, so go into the Global.asax and in Application_EndRequest change 500s to 200s. [Yes, this requires that you run WCF with AspNetCompatibility.]

Some progress. Silverlight now gives a more useful exception:

Error: System.Runtime.Serialization.SerializationException: OperationFormatter encountered an invalid Message body. Expected to find node type 'Element' with name 'MyFunctionResponse' and namespace 'http://schemas.contoso.com/coolservice/v2/00'. Found node type 'Element' with name 's:Fault' and namespace 'http://schemas.xmlsoap.org/soap/envelope/'

Hmm, it's getting the fault, but can't seem to handle it. I tried adding typed faults to my WCF contract, but the SL WCF stack doesn't have the FaultContractAttribute. It appears as if SL cannot read faults at all.

Enter HackFaults. That message provides two pieces of data from the SOAP message, the element name, and the namespace. We can hijack these to pass our fault information, then extract it from the exception message. H to the A to the C-K-Y.

First off, we need to get access to our WCF fault. We can do this with an System.ServiceModel.Dispatcher.IErrorHandler. I've attached the full code, but basically you add an attribute to your WCF service to wire up the error handler. If you're doing WCF work, you probably want this anyways for logging purposes and so on. I've attached the entire error handler code to this article. The real meat of the error handler is this little function:

public void ProvideFault(Exception error, MessageVersion version, ref Message fault)
{
    // HackFaults store the exception in the httpcontext
    var context = System.Web.HttpContext.Current;
    if (context != null) {
        // Only works in AspNetCompat mode
        if (!context.Request.Browser.Crawler && context.Request.Browser.EcmaScriptVersion.Major > 0) {
            // This should rule out non-browsers
            context.Items.Add("HackFault", error);
        }
    }
}

I'm not sure if there's a better way to differentiate between browsers and real SOAP stacks. If there is let me know. Now, in our Global.asax, we want to pick this information up and replace the actual SOAP fault with our own hackfault:

protected void Application_EndRequest(object sender, EventArgs e)
{
    // In all fairness, I was drinking eiswein at the time
    if (Context.Items.Contains("HackFault")) {
        Context.Response.ContentType = "text/xml";
        Context.Response.StatusCode = 200;
        Context.Response.ClearContent();

        var hackEx = Context.Items["HackFault"] as Exception;
        var hackFaultXml = string.Format(
            "<s:Envelope xmlns:s=\"http://schemas.xmlsoap.org/soap/envelope/\"><s:Body><{0} xmlns=\"{1}\" /></s:Body></s:Envelope>",
            hackEx.GetType().ToString(), Context.Server.HtmlEncode(hackEx.Message));
        Context.Response.Output.Write(hackFaultXml);
    }
}

Now we have a message that has the bits of data we care about in the right places for the Silverlight exception to be ready for parsing. The SL-side parsing code is vile but easy:

public static string ExtractHackFaultMessage(this Exception ex)
{
    // HackFaults generate these Exceptions and messages:
    //Without debug:
    //Could not connect to server: System.Runtime.Serialization.SerializationException: [SFxInvalidMessageBody]
    //Arguments:FaultException,Something silly
    //With debug:
    //Could not connect to server: System.Runtime.Serialization.SerializationException: 
OperationFormatter encountered an invalid Message body. Expected to find node type 'Element' with
name 'SomeOperationResponse' and namespace 'http://schemas.cool.com/yea/v2/00'.
Found node type 'Element' with name 'FaultException' and namespace 'Something silly'
if (!(ex is System.Runtime.Serialization.SerializationException)) return null; var msg = ex.Message; string regexp; if (msg.Contains("[SFxInvalidMessageBody]")) { regexp = @"Arguments:(.*),(.*)"; } else if (msg.Contains("OperationFormatter encountered an invalid Message body")) { regexp = @"Found node type 'Element' with name '(.*)' and namespace '(.*)'"; } else { // Nope, it's some thing else return null; } var match = System.Text.RegularExpressions.Regex.Match(msg, regexp); if (match.Groups.Count != 3) return null; // Not expected var exType = match.Groups[1].Value.Trim(); var exMsg = match.Groups[2].Value.Trim(); if (exType == "System.ServiceModel.FaultException") return exMsg; else return exType+ ": " + exMsg; }

This little function will give me the fault information (and not mention FaultException if that's what it is) or null if it can't extract it. When dealing with errors, I can show an error like this: "Error: " + (ex.ExtractHackFaultMessage() ?? ex.ToString())

This gives me at least rudimentary error reporting from the server back to Silverlight. If this can be improved or fixed up, please tell me. (Typed fault exceptions would be nice, but then SL can't codegen the contracts.) At any rate, it works as a cheap hack until Silverlight Beta 2 (they gotta fix it by then, right? Right?).

HttpErrorHandler.cs.txt (2.49 KB)
Code
Thursday, March 27, 2008 7:19:36 AM UTC  #    Comments [1]  |  Trackback

# Thursday, February 28, 2008
Writing events with System.Diagnostics.Eventing

... or, how the hell to use Vista and 2008's new ETW stuff with managed code. And, introducing ecmanaged: A decent way to do all this stuff.

Quick ETW Overview

Actually, the real ETW overview is here: http://msdn.microsoft.com/msdnmag/issues/07/04/ETW/ <-- This is some of the best overview and documentation on it (the other good stuff is the ecmangen documentation in the Windows SDK bin folder). The MSDN stuff is terribly confusing for the most part. Or maybe I'm too spolied by how easy it is to find stuff in the BCL. My overview is on what you gotta do to make things work in .NET.

ETW is a real pain to use with .NET. Even so, ETW starts off looking really promising. You define everything in a nice XML manifest file, and everything is based off that. But wait, everything? Shouldn't the manifest be the end-all? Yea, that'd make logical sense. No, you run some tools from the Windows SDK. First you run MC, which generates a .h header file. Managed devs are growning now -- why the hell should something as general as event tracing be language specific? The .h file contains the processed event descriptors, ready for C consumption. 

It worsens: MC also generated a resource script. You have to compile that with RC and it'll create a Win32 .res resource. Then you compile that into a binary (the C# compiler has the /win32res option). Then you go back and edit your XML manifest and make sure it points to the final binary. Wait, what? Yes. The resources that MC generates for RC contain all the messages that are in your XML manifest. Someone thought it was a really cute idea to go and make the Event Viewer not only read all the data from your manifest, but also have to go look it up from some binary resources.

Actually, this probably made sense to someone on the Windows team since I'm guessing they already have tools to go and localise Win32 resources or something. Unfortunately, it sucks and makes no sense for anyone NOT in their particular position. Now, I hope I'm wrong (I really, really want to be wrong), but I think there's no way to force the message strings to just stay in the XML file and be read from there. 

Finally, things get easy again. Just run "wevtutil install-manifest Some.man" (wevtutil is in system32). In fact, this utility is so user friendly, it even lets you type "im" instead of "install-manifest". At this point, assuming the other steps went well, your provider shows up in Event Viewer.

ECManGen

But wait, how do I actually make that manifest? This part is almost the easiest. In the Windows SDK, there's a lovely little tool called ECManGen. Just fire it up, and go to town adding Providers, Channels, Templates, and Events.

Providers are the main things that show up in your Event Viewer, such as MyApp-FooProduct-LameComp. Channels separate Admin/Operational/Debug and others. Templates are an argument list for Events. If you have, say, a bunch of events that take the same kinds of parameters, you can share templates among them (I find it helpful to create a "SingleStringTemplate".) It's very straightforward.

*Note: I can't actually get Admin channels to work. If I create an event and stick it in an Admin channel and set its level to Informational, MC complains (as does ECManGen) that the level has to be Critical, Error, or Informational. Uh, OK. Instead, just use Operational.

Except... ECManGen is a free utility. (Free? Perhaps not, seeing as the annual MS bill for a 4-person dev team is around $20,000 (counting just MSDN) -- but it's well worth it.) Part of the docs say: "NOTE: For the Manifest Generator Tool to function correctly, the file winmeta.xml (which contains pre-defined metadata values) must either be in the Include directory of the SDK or in the same directory as the Manifest Generator tool." OK, easy enough. Except... it doesn't work that easily. The only way I got it to work was to copy the xml file over to the same directory, *and* start ECManGen from that directory.

Oh yea, ECManGen won't open your manifest file if you pass it as an argument, so forget about cute VS integration. Just Google ecmangen and go rate up the bugs on Connect :).

Going Managed

OK, so you're not living in the last century and use decent tools -- how does this map to C#? First off, you create an EventProvider with the right Guid (the one from your manifest). Then you create an EventDescriptor for each event, matching up all the little parameters (the MSDN docs for EventDescriptor have more details). Finally, you can call WriteEvent, passing the EventDescriptor *by ref* for some reason (no, I can't figure out why).

Oh yea, and you have to hookup that Win32 resource to your C# project, so if you needed another resource (like another app manifest?), you'll have to go deal with merging them and all that hassle. And, don't forget to make sure the parameters you pass into the object[] array of WriteEvent line up with what your manifest has. And also, the .NET API won't even handle the Boolean->BOOL (4 byte) silliness for you. 

In summary, it's a lot of boring, error-prone work, and you'll have to repeat it every time you edit your manifest. Yuck. Maybe it's just easier to use the old event log stuff and forget about all this fancy ETW stuff.

However, the ETW features are pretty cool. As is ecmangen (well, for the most part). I couldn't tear myself away from the promises the new ETW stuff offered, so I sat down and wrote a hack tool to do everything for me. The goal is to have an XML file, then convert it right into a DLL I can use.

ecmanaged does exactly this. Sorta. It doesn't deal with complicated scenarios or performance counters, but handles straightforward event logging with templates in an easy and type-safe fashion.

Using ecmanaged

Usage: ecmanaged ecgen <manifest> /out:<target assembly> /namespace:<target namespace> /class:<target class>
Usage: ecmanaged msglocate <manifest> <messageFileName>
Usage: ecmanaged install <manifest> <targetManifest> <messageFileName>
Usage: ecmanaged uninstall <targetManifest>

Call ecgen with a manifest, and it'll go build an assembly with the wrappers for calling your provider events. It'll stick the Win32 resource on the assembly too, so it's ready to be used.

Call msglocate with the manifest and the message binary file, and it'll go fix the manifest to point to the right place.

Call install, and it'll copy the manifest to another location, fix the msg location, and call wevtutil on it. It copies because I'm assuming your original manifest is under source control, and hence shouldn't be fixed up.

Uninstall just calls wevutil uninstall-manifest.

Yey! Now you can just make a batch file to call this on your manifest and run it every time you change your manifest. What I do is stick this in a folder in my project called "Eventing". I check the resulting binary into source control (just like a 3rd party DLL). The downside is making sure you always update the DLL if you modify the manifest. I'm sure somehow msbuild can come to the rescue (and autoupdate when the manifest changes, so Intellisense keeps working), but this works for me.

I welcome all comments/questions/insults, but please, no death threats. Download: ecmanaged.exe (45 KB)

P.S. No, I don't hate Windows or the devs on it, although I hate unmanaged code (it IS 2008, innit?). Perhaps I'm just annoyed I have to make a Saving Throw versus Hang every time I click Send in Outlook 2007 (yes, patched up, even with Vista SP1... which made it worse). But I still think this whole ETW toolchain is terribly unpolished.

Code
Thursday, February 28, 2008 9:12:25 PM UTC  #    Comments [1]  |  Trackback

# Thursday, September 06, 2007
Complicated functions in LINQ to SQL

Rob Conery talks about geocoding with LINQ here. In his post, he provides some code using the Haversine formula to compute the distance between two points on earth. His function is declared as:

Func<double, double, double, double, double> CalcDistance = (lat1, lon1, lat2, lon2) => …

Further, this delegate is wrapped in a normal C# method. Now, read my previous post about calling functions in LINQ to SQL. Can you see where things are going to go wrong? That's right, a normal delegate or method can't be converted into SQL by LINQ, as the engine has nothing to work with. Only when our code is available as data can the LINQ to SQL engine do it's magic. Since there's some insinuation that LINQ to SQL just can't handle things like Haversine, I'll demonstrate how to do it.

If you want to use your own "complicated" functions with LINQ to SQL, you'll need to manually construct the predicate expression. It's not pretty, but, it does let you convert somewhat detailed functions, such as Haversine, inside of LINQ-to-SQL queries. This is not always the right approach: in some cases it'll be better to use a UDF or stored procedure. (If someone knows a native, better way, please let me know! TomasP's Expandable stuff looks cool, but I ran into some bugs on Beta 2. This is really something the compiler should help out with!).

To start off, we need to declare our function as an Expression Tree (I cannot vouch for the accuracy of this code; I'm merely demonstrating LINQ to SQL technique. For the geocoding details, read Rob's post.):

const double R = 6367;
const double RAD = Math.PI / 180;
static Expression<Func<double, double, double, double, double>> dist =
(lat1, lon1, lat2, lon2) =>
    R * 2 *
    (
        Math.Asin(Math.Min(1,Math.Sqrt(
            (
            Math.Pow(Math.Sin(((lat1 * RAD - lat2 * RAD)) / 2.0), 2.0) +
            Math.Cos(lat1 * RAD) * Math.Cos(lat2 * RAD) *
            Math.Pow(Math.Sin(((lon1 * RAD - lon2 * RAD)) / 2.0), 2.0) 
           
        ))) 
    );

OK, that was the easy part. We just had to wrap Expression<> around the type declaration and remove the method calls. But how do we pass this to our query? The Where method has this signature:

public static IQueryable<TSource> Where<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate);

Hence, we need to provide a predicate of Expression<Func<TSource, bool>> for it do its magic. To work with expressions, we need to start using the System.Linq.Expressions namespace. For more clarity, I aliased E to System.Linq.Expressions.Expression. Building expressions isn't particularly hard, it's just annoying and time consuming. C#'s lack of "symbolics", or a way to use the compiler to create expressions to reference properties, etc. means we have to use strings to do so. I never did claim it'd be pretty.

We start off with some data from elsewhere in our application. In my example, I'm just going to declare some locals:

double coolLat = 43.641852;
double coolLong = -79.387298;
double maxDist = 100;

The first Expression we need is a ParameterExpression to refer to the source item from the table (i.e., the parameter the Where method is going to give to us). In my code, my table is called Accounts, hence my object type is Account. To create the parameter expression:

var acctParam = E.Parameter(typeof(Account), "a");

Next, we need to be able to reference the fields in the table. With LINQ to SQL, these are properties on our object. We create them like this:

var acctLon = E.Property(acctParam, "Longitude");
var acctLat = E.Property(acctParam, "Latitude");

The secret sauce is creating the invoke to our dist expression. This is where all the work comes into play and includes all our complicated code in the LINQ to SQL query. Fortunately, after building up our arguments separately, it's not that hard:

var distCalc = E.Invoke(dist, E.Constant(coolLat), E.Constant(coolLong), acctLat, acctLon);

We have to use ConstantExpressions for our local variables. Using constant allows us to capture the value of those variables. Now we're ready to finish off, by adding a less than comparison and turning it all into a <TSource, bool> LambdaExpression:

var maxComp = E.LessThan(distCalc, E.Constant(maxDist));
var pred = E.Lambda<Func<Account, bool>>(maxComp, acctParam);

Our query can now look like this:

MembersDataContext dc = new MembersDataContext();
var q = dc.Accounts.Where(pred);
Console.WriteLine(q.ToString());

When we run it, we see that LINQ to SQL is quite capable of handling our little bit of math:

exec sp_executesql N'SELECT [t0].[AccountId], [t0].[Latitude], [t0].[Longitude]
FROM [dbo].[Accounts] AS [t0]
WHERE (@p0 * ASIN(
(CASE
WHEN @p1 < SQRT(POWER(SIN(((@p2 * @p3) - ([t0].[Latitude] * @p4)) / @p5), @p6) + (COS(@p7 * @p8) * COS([t0].[Latitude] * @p9) *
   
POWER(SIN(((@p10 * @p11) - ([t0].[Longitude] * @p12)) / @p13), @p14))) THEN @p1
ELSE SQRT(POWER(SIN(((@p2 * @p3) - ([t0].[Latitude] * @p4)) / @p5), @p6) + (COS(@p7 * @p8) * COS([t0].[Latitude] * @p9) *
    POWER(SIN(((@p10 * @p11) - ([t0].[Longitude] * @p12)) / @p13), @p14)))

END))) < @p15'
,N'@p0 float,@p1 float,@p2 float,@p3 float,@p4 float,@p5 float,@p6 float,@p7 float,@p8 float,@p9 float,
    @p10 float,@p11 float,@p12 float,@p13 float,@p14 float,@p15 float'
,@p0=12742,@p1=1,@p2=95.412000000000006,@p3=0.017453292519943295,
@p4=0.017453292519943295,@p5=2,@p6=2,@p7=95.412000000000006,@p8=0.017453292519943295,@p9=0.017453292519943295,@p10=102.63200000000001,
@p11=0.017453292519943295,@p12=0.017453292519943295,@p13=2,@p14=2,@p15=100

I still think there should be some kind of syntax so we could write it like we want to: (Where(a=>dist(coolLat, coolLong, a.Latitude, a.Longitude) > 10)). If anyone knows a built-in way to do it, please let me know. I'm sure it's just something simple I'm overlooking...

Code
Thursday, September 06, 2007 1:56:54 PM UTC  #    Comments [4]  |  Trackback

# Wednesday, September 05, 2007
Calling custom methods in LINQ-to-SQL

This was sparked by the issues raised by Rob Conery here. Basically, if you have some semi-complicated function that you need to apply to a LINQ-to-SQL query, how can you do it? This is somewhat covered by TomasP.NET: Building LINQ Queries at Runtime in C# and Joseph Albahari: Dynamically building LINQ expression predicates. I recommend those two articles; they're very good. I'm going to write a bit about it too. Some of it might be redundant, some of the ideas I took from those articles. The code is all mine.

So, let's say we want to get all accounts where the square root of the account ID is even. This will serve as our placeholder for a totally contrived example. Just calling our method in our LINQ query won't work because the LINQ-to-SQL code isn't going to know what to do with it. A method is just an opaque block of code with a name. Here's an example:

static void Main(string[] args) {
    MembersDataContext dc = new MembersDataContext();
    var q = dc.Accounts.Where(a => IsRightAccount(a));
    Console.WriteLine(q.ToString());
}
static bool IsRightAccount(Account a) {
    return Math.Sqrt(a.AccountId) % 2 == 0;
}

This code crashes with: Unhandled Exception: System.NotSupportedException: Method 'Boolean IsRightAccount(ConsoleApplication1.Account)' has no supported translation to SQL. Which should be expected, as LINQ to SQL cannot know what goes on inside that method and thus can't translate it.

Let's change things to make IsRightAccount be a Func delegate (from a lambda expression):

static Func<Account, bool> IsRightAccount = a => Math.Sqrt(a.AccountId) % 2 == 0;

Now we get: Unhandled Exception: System.NotImplementedException: The method or operation is not implemented. At System.Data.Linq.SqlClient.QueryConverter.VisitInvocation(InvocationExpression invoke). That's a somewhat strange exception, as I'd expect it to be a bit more helpful. At any rate, I'd expect it to crash, because that is just a delegate to a method. It's still an opaque block of code.

Enter the magic Tree of Expressions: Expression<T>. As I mentioned in my last post, Expression Trees provide "introspection" (reflection against code). In the examples above, the lambda that calls IsRightAccount for the Where clause actually turned into an InvocationExpression that represents a call to the delegate provided. Hence me saying that it is "opaque". What we need is to make sure that our code (our IsRightAccount calculation) is visible as data. When it's visible as data, then LINQ-to-SQL can go and say "Oh, you want to take the square root of the account ID, mod it by 2, and see if that's zero… now THAT I can do in SQL".

Declaring an Expression Tree is really simple. First, make sure you import System.Linq.Expressions if you don't want to fully qualify the name. Then, declare your tree just like any lambda Func, except this time make the type Expression<MyFunc>:

static Expression<Func<Account, bool>> IsRightAccount = a => Math.Sqrt(a.AccountId) % 2 == 0;

We will also change our Where clause to accommodate the fact that we are not calling a method:

var q = dc.Accounts.Where(IsRightAccount);

And presto! Our program now shows a SQL conversion:

SELECT [t0].[AccountId], [t0].[Email]
FROM [dbo].[Accounts] AS [t0]
WHERE (SQRT(CONVERT(Float,[t0].[AccountId])) % @p0) = @p1

Things get a bit more complicated when we try to stack expressions together. As far as I know, we must create the Expression manually (using the Expression static methods); the compiler won't help out. If anyone knows a built-in way around this, please let me know. Otherwise, see the links at the top of this article for more information and other workarounds.

This also applies if you have complex logic that doesn't directly map to a item -> bool predicate expression. In those cases, you can still encapsulate most of your code by using the compiler to generate the bulk of the Expression, and then just wrap it with a bit of hand-created expression. In my LINQ to the CRUD article, the code attached uses this approach to generate queries for the select/delete commands. Again, I will note that TomasP has written expansions so you can just write myCoolExpression.Expand(arg) rather than building everything by hand.

If you know of any other links or work done in this area, I'm very interested in seeing other approaches. Thanks!

Code
Wednesday, September 05, 2007 11:58:49 PM UTC  #    Comments [2]  |  Trackback

C# 3.0 and LINQ Misunderstandings

Apparently, there is some considerable confusion over all the new C# language features. People who I would hope are reasonably intelligent are completely misunderstanding some C# fundamentals. Agreed, a lot of the new concepts introduced to C# 3.0 are might seem relatively foreign to C# users. Microsoft's marketing related to LINQ doesn't help much either. I'm going to try to clarify the top few things I've seen. I'll reference the C# spec (http://msdn2.microsoft.com/en-us/library/ms364047(vs.80).aspx) so you can know I'm being accurate.

Myth: LINQ is just a data access technology

While data access is obviously a large part of LINQ (even the name stands for Language Integrated Query), you can do a lot more than just access data. Re-using a previous example:

    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());

There's no sign of data access there! The practical functional C# articles on my site go into more detail on this. Suffice to say, while a lot of new features were added to "make LINQ possible", much, much, more is possible than just creating queries. (Alternatively, you might chose to limit the meaning of LINQ, as I'd like to do. In this case, I wouldn't consider using lambdas, extensions, etc. as LINQ. Others may disagree. Marketing?)

Implicitly typed local variables (var keyword)

C# is still statically and strongly typed. But, there's a new feature that lets you declare local variables without specifying the type, if the type can be inferred from the initializing expression. From the spec:

var i = 5;
var s = "Hello";
var d = 1.0;
var numbers = new int[] {1, 2, 3};
var orders = new Dictionary<int,Order>();

The implicitly typed local variable declarations above are precisely equivalent to the following explicitly typed declarations:

int i = 5;
string s = "Hello";
double d = 1.0;
int[] numbers = new int[] {1, 2, 3};
Dictionary<int,Order> orders = new Dictionary<int,Order>();

Oddly enough, the C# spec doesn't even mention LINQ or anonymous types when it talks about "var" locals. Why is there confusion about this simple feature? Let's examine anonymous types:

C# anonymous types allow you to declare a type just by specifying its fields. From the spec: "C# 3.0 permits the new operator to be used with an anonymous object initializer to create an object of an anonymous type." For instance, the following code is a valid expression:

new { Name = "Michael" }

This produces an object of a new, anonymous, type, containing a single string property called Name. Hence, this code works:

Console.WriteLine(new { Name = "Michael" }.Name);

However, how can you assign such an object to a variable? Yes, that is the only "need" for the var keyword. There's no way to name the type, since it's anonymous. Regardless if you agree with anonymous types (versus a Tuple class), this is the place where you *need* to use var: assigning an anonymous type to a local.

As you may have noticed, I still haven't mentioned LINQ. Anonymous types are not LINQ specific. They are, however, particularly helpful for certain LINQ queries:

var topCustomers = MyDatabase.Customers.Where(c => c.GoldStar == true).Select(c => new {c.CustomerId, c.Name});

Because of this, people start to associate anonymous types only with LINQ queries and hence, var with LINQ only. The truth is that these features can be used anywhere.

What's the takeaway here? The var keyword simply allows the compiler to infer the type of the variable so you don't have to specify it. Nothing more, nothing less. Some people still want to explicitly annotate every single variable – hey, that's their choice. But don't be locked into this just because there was no option before. Me? I'll take more concise code any day!

As a side note (if this wasn't clear), the var keyword is NOT dynamic typing, just implicit typing (type inference).

Dynamic C#

Seems there's a lot of confusion about C# being dynamic. C# is not dynamically typed, as some seem to imply. I think perhaps some of the confusion comes from all the nice type inference that C# provides. Using the var keyword as shown above might make some people feel "oh, I'm saying var just like Javascript!". Adding to the confusion is the fact that C# is now a "semi"-functional language. For instance, from "why's (poignant) guide to ruby", we see this Ruby code:

5.times { print "Odelay!" }

C# allows us to write in a similar style:

5.Times(() => Console.WriteLine("Odelay!"));

The next Ruby example in that guide goes like this:

exit unless "restaurant".include? "aura"

In C#, we can write:

exit.Unless(() => "restaurant".Contains("aura"));

The Wikipedia article on Functional Programming lists a few features that dynamic languages usually have:

Eval

Sorta… you can create Expression<T> and execute them. On the other hand, you can't do anything like eval(myString) (which is just asking for runtime failure).

Higher-order functions

Definitely in there.

Runtime alteration of object or type system

No, not really. (I.e., maybe you can hack around with certain APIs to try to do some magic, but it's not a language feature.)

Functional Programming

Yes. But still, functions aren't really first class citizens…yet. Once we can start using method groups as Actions and Funcs, implicitly, then it'll get even better. This is an interesting presentation from Andrew Kennedy, Microsoft Research: C# is a functional programming language.

Closures

Yep, since anonymous methods were introduced in C# 2.0.

Continuations

Extremely limited, in the form of yield return.

Introspection

Not just reflection, but actually inspecting the actual code. C# 3.0 has this in the form of Expression<T> (see below).

Macros

No, not crappy C-style macros. Here, I'm thinking more like macros that'd let you create things like C# query comprehensions, *in source code*. (Which is what I was actually hoping when I saw the new query comprehension syntax… no such luck).

In summary, C# lets you gain a lot of benefits usually associated with dynamic programming, but without the nasty parts of dynamic typing.

A great paper on this subject is Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages, by Erik Meijer and Peter Drayton of Microsoft.

Myth: Extension methods add methods to a class

This is a tricky one, since extension methods appear to be exactly that. This myth is also somewhat perpetuated by the C# spec: "In effect, extension methods make it possible to extend existing types and constructed types with additional methods." But, right before that, the real explanation is given: "Extension methods are static methods that can be invoked using instance method syntax."

Essentially, Extension methods allow us to use infix notation with certain methods. This explains the line from the spec "in effect". A more helpful way to think of this is by thinking of the "." operator as an overloaded operator that also allows passing the first operand given to it as the first argument to specially marked methods.

An alternative* would be to define an operator like the F# pipeline operator (|>). In C#, this would let us write stuff like:

customers |> Seq.Where(c => c.Name == "Michael")

That doesn't look like an improvement. BUT, we no longer need to mark methods in a special way. We can just use them:

myArray |> Array.BinarySearch("s")

Why do we need infix notation anyways? Well, the normal prefix notation can be difficult to read:

Select(Where(customers, c => c.Cool == true), c => c.Name)
Array.BinarySearch(items, "S")

Extension methods just make those functions easier to pipeline. That's all folks. Think of them like this, and save yourself a headache about what "extending a type" means.

*My guess as to why extension methods are done they way they are is because it could confuse people if you have something like "item |> Stuff.SomeMethod("X")" where SomeMethod returns a function. Or, where you have "item |> Stuff.SomeMethod("X").SomethingElse("y"). I'm still annoyed that I can't use infix semantics where *I* want, but oh well.

Lambda expressions and Expression<T>

Spec: "Lambda expressions provide a more concise, functional syntax for writing anonymous methods.". Spec: "Expression trees permit lambda expressions to be represented as data structures instead of executable code. A lambda expression that is convertible to a delegate type D is also convertible to an expression tree of type System.Query.Expression<D>."

So, "lambda expressions" can either be just code (i.e., directly executable IL) OR they can get turned into a data structure, Expression<T>.

Adding to the confusion, a lambda expression can contain just an expression (i=> i + 1), or it can be a block of statements ( i => {Write(i); i++; Write(i); return i+1;} ). However, a lambda expression with a statement block body cannot become an Expression<T>. (As far as I know, VB's lambdas only allow for expression bodies, not blocks.)

An example:

Expression<Func<int, int>> inc = i => i + 1;

Is equivalent to:

ParameterExpression param_i = Expression.Parameter(typeof(int), "i");
var inc2 = Expression.Lambda<Func<int, int>>(
    Expression.Add(
        param_i, 
    Expression.Constant(1, typeof(int))),
param_i);

At runtime, you can then go inspect the actual code and decide what to do with it. This is exactly the premise for LINQ to SQL. When you create a LINQ-to-SQL query, it is turned into an expression like shown above. Then the LINQ-to-SQL APIs inspect and convert that expression tree into SQL statements.

Here, I can understand the confusion. The word "expression" is used in three distinct manners. Rightfully, Expression Trees should be referred to as Expression or Expression<T>, which could help clear up some of the confusion. Additionally, it doesn't help that lambdas have these different conversion rules (although working around it could be ugly, possibly).

Are there any other features that you've seen misused or you've had questions about? Let me know! I love comments, insults, and suggestions.

Want to correct me on something? Go right ahead! But, if you're going to say something like "C# doesn't have type inference", please make sure to either be an expert on the matter or be able to quote an authoritative source or show a proof. Thanks!

Code
Wednesday, September 05, 2007 3:01:18 AM UTC  #    Comments [10]  |  Trackback

# Wednesday, August 29, 2007
Practical Functional C# - Part IV – Think in [Result]Sets

Don't skip parts I to III to get you up to speed. I have $300 up for grabs if these articles are incorrect in stating that it will greatly improve most C# apps:

    Practical Functional C# - Part I 
    Practical Functional C# - Part II 
    Practical Functional C# - Part III - Loops are Evil

Tell the compiler what you want, rather than how to do it. That's a key concept that can take you very far. However, as imperative programmers, this can sometimes be a hard concept. We are so used to instructing the processor, step-by-step, how do to things, and then only after we're all done, making sure the end effect is to our liking. Functional programming helps us change this.

Consider the following task: Write a function that checks a username against a few disallowed characters "!@#$%^&*". The normal C# implementation looks like this (null checks removed):

static bool IsBadUserName(string userName)
{
    var badChars = "!@#$%^&*";
    foreach (char c in badChars) {
        if (userName.Contains(c)) return true
    }
    return false;
}

It's not horrible; there's only one branch, but it does require a bit of thought to sort out. Now let's take an approach where we deal with string as a set of characters to manipulate at once (requires C# 3.0 compiler):

static bool IsBadUserName2(string userName){ 
    var badChars = "!@#$%^&*";
    return userName
        .Intersect(badChars) 
        .Any();
}

We've reduced the function from a step-by-step loop into something that has only path. The bigger benefit is that there's no need to evaluate end conditions and so on for correctness: The code says exactly what it does. "Are there any bad characters in the user name?" We don't have to worry how it does what we asked, we just need to think in terms of what we want our result to be.

A common function in functional languages is called "map". Map takes a list of something and turns it into a list of something else. For instance, if we had a list of integers ("ourInts"), we could turn them into squares by saying "map ourInts by multiplying each value with itself". In C# (LINQ), they called map "Select". Here's a quick example:

var ourInts = new[] { 2, 5, 13 };
var squares = ourInts.Select(i => i * i);

Squares will contain the list { 4, 25, 169 }. What use is this? Well, it is an extremely common pattern to take some set of data, filter it, modify it a bit, and return a new set of data. Here's an example: You have a variable from containing semicolon-delimited email addresses. You want to turn these into an array of .NET's MailAddress objects to use with some other code. The loop isn't very pretty:

var tempAddresses = new List<MailAddress>();
foreach (string s in semicolonEmails.Split(';')) {
    var ts = s.Trim();
    if (ts == "") continue
    tempAddresses.Add(new MailAddress(ts)); 
}
var myAddresses = tempAddresses.ToArray();

But consider the functional approach:

var myAddresses = semicolonEmails
    .Split(';'
    .Select(s => s.Trim())
    .Where(s => s != ""
    .Select(s => new MailAddress(s))
    .ToArray();

In one statement, we transform the data three times, as well as add filter to remove empty items.

One place where LINQ falls flat on its face is when it comes to processing data. For some reason, there are no methods defined to do "ForEach" or "Process". (Even more interesting: List<T> does define these methods.) Process is a great pattern: on each item, it performs some action, then returns the original item. The code to define it is very simple and looks like this:

static IEnumerable<T> Process<T>(this IEnumerable<T> source, Action<T> f)
{
    foreach (var item in source) {
        f(item); 
        yield return item; 
    }
}

How is this of use? Well, let's chain together some functional and imperative processing. For instance, write a program that does some long process on all files passed in as arguments – on separate threads. If we take the purely imperative approach, our code looks like this:

static void Main(string[] args)
{
    var threads = new List<Thread>();
    foreach (var s in args) {
        Thread t = new Thread(startLongProcess); 
        t.Start(s); 
        threads.Add(t); 
    }
    foreach (var t in threads) {
        t.Join();
    }
}
static void startLongProcess(object data)
{
    SomeLongProcess((string)data);
}

Yes, we actually must declare a separate function just to invoke SomeLongProcess. Let's combine and use the functional approach now:

static void Main(string[] args)
{
    args
        .Select(s => new Thread(() => SomeLongProcess(s))) 
        .Process(t => t.Start())
        .ToList()
        .ForEach(t => t.Join());
}

Which way is going to be easier to edit and change around? I don't know about you, but for me, going from ~12 to ~5 lines, removing extra variables, useless functions and flow control structures: that's a hands-down win in my book.

As a side note, threading, in fact, is a space that is extremely ripe for functional styles. I'm willing to bet that ".NET 4" will include threading extensions that rely heavily on functional concepts. For instance, it's easy to create a method that allows us to replace the previous program with this:

args.Parallel(SomeLongProcess);

But I'll talk about that another day.

In the next article, I'm going to cover C#'s new inner functions capability and how that can be used to help build up more complicated function chains. I'd also like some feedback on which kinds of areas of C# programming you've run into that seem to require more code than necessary.

Code
Wednesday, August 29, 2007 10:45:59 AM UTC  #    Comments [4]  |  Trackback

# Sunday, August 26, 2007
A LINQ to the CRUD

A question many people have run into is: How does CRUD fit into LINQ-to-SQL? While LINQ-to-SQL (I'll abbreviate as DLINQ) provides a very fast and easy way for us to start querying our data, it doesn't handle updates as beautifully. It is particularly noticeable when you are doing multi-tier and hence cannot share a DataContext.

Let's consider an example database called "MembersDatabase" which has a table called "Accounts". Accounts has an int primary key, and a varchar Email field. We use the DLINQ designer and create the dbml that generates a class called MembersDataContext. How does our app-tier code look? I'll give you a hint, it starts with "ug" and rhymes with "nasty":

// Select
int someId = 123; // Passed from another tier
Account someAccount; // Can't use implicit typing -- no anonymous types
using (var dc = new MembersDataContext()) {
    someAccount = dc.Accounts.SingleOrDefault(a => a.AccountId == someId);
}

// Insert
var myAccount = new Account();
myAccount.Email = "me@contoso.com";
using (var dc = new MembersDataContext()) {
    dc.Accounts.Add(myAccount); 
    dc.SubmitChanges();
}

// Update
int myId = 1; // Id and email passed from another tier
string newEmail = "cool@cool.com";
var changedAccount = new Account();
changedAccount.AccountId = myId;
changedAccount.Email = newEmail;
using (var dc = new MembersDataContext()) {
    dc.Accounts.Attach(changedAccount, true); 
    dc.SubmitChanges();
}

// Delete
int idToKill = 2; // Passed from another tier
using (var dc = new MembersDataContext()) {
    using (var txScope = new System.Transactions.TransactionScope()) {
        var acc = dc.Accounts.SingleOrDefault(a => a.AccountId == idToKill); 
        if (acc == null) throw new ChangeConflictException("Row not found."); 
        dc.Accounts.Remove(acc); 
        dc.SubmitChanges();
        txScope.Complete();
    }
}

It's not horrible; it's certainly better than anything before it. But, we can do better. I created some helper classes to do so (see attached file).

First, DatabaseBase<TContext>. This holds our tables, and provides DataContext helper functions Use and Query. Second, TableBase<TItem, TKey>. This actually provides our CRUD methods. I don't think anyone is overly interested in the implementation details (comment if I'm wrong), so here's how you declare your CRUD types:

class MembersDatabase : DatabaseBase<MembersDataContext>
{
    public static readonly TableBase<Account, int> Accounts 
        = CreateTable(dc => dc.Accounts, a => a.AccountId);
}

That's it.

You create a new class to serve as your "database" class. All that's required here is to inherit from DatabaseBase.

Next, for each table, simply create a new field via CreateTable. The first parameter is a lambda function that selects the right table off the DataContext. The second parameter is a lambda expression that selects the primary key. Not much too it.

So, how does our previous chunk of code look with this small helper library?

// Select
int someId = 123; // Passed from another tier
var someAccount = MembersDatabase.Accounts.SelectByKey(someId);

// Insert
var myAccount = new Account();
myAccount.Email = "me@contoso.com";
MembersDatabase.Accounts.Insert(myAccount);

// Update
int myId = 1; // Id and email passed from another tier
string newEmail = "me@me.com";
var changedAccount = new Account();
changedAccount.AccountId = myId;
changedAccount.Email = newEmail;
MembersDatabase.Accounts.Update(changedAccount);

// Delete
int idToKill = 2; // Passed from another tier
MembersDatabase.Accounts.Delete(idToKill);

That's about 40% less code. It’s far more straightforward, being a single block.

To start using this code, just drop DatabaseBase.cs into your project. It adds DatabaseBase to System.Data.Linq. Then subclass as shown above, and you're on your way to LINQ updating bliss. What do you think?

DatabaseBase.cs (5.65 KB)

P.S. At any rate, I should get points for the Zelda pun, right?

Update: I forgot to mention, you'll want to turn UpdateCheck to Never on your columns in the LINQ-to-SQL designer.
Update: The code (including the Tuple class) for the .NET 3.5 RTM release is here: http://www.atrevido.net/blog/2008/06/26/LINQ+To+The+CRUD+RTM.aspx

Code
Sunday, August 26, 2007 8:30:32 PM UTC  #    Comments [10]  |  Trackback

# Wednesday, August 22, 2007
Reason #52 against Visual Basic (Nothing in Visual Basic)

What people do in their own time in the privacy of their homes is none of my business. However, when they mess with reading documentation, then it crosses the line and becomes annoying. How many times do VB developers need to be told that a null is "Nothing"? Consider this snippet from MSDN:

-------
The CreateUser method will return a null reference (Nothing in Visual Basic) if password is an empty string or a null reference (Nothing in Visual Basic), username is an empty string or a null reference (Nothing in Visual Basic) or contains a comma (,), passwordQuestion is not a null reference (Nothing in Visual Basic) and contains an empty string, or passwordAnswer is not a null reference (Nothing in Visual Basic) and contains an empty string.
-------

Five times in one paragraph! I know null type systems are annoying and lead to errors, but that seems a bit excessive. Seriously though, it'd make more sense to make VB developers learn a few words once, rather than having to mess up documentation just in case they get confused.

Code | Humour
Wednesday, August 22, 2007 2:49:34 PM UTC  #    Comments [5]  |  Trackback

# Thursday, August 16, 2007
Practical Functional C# - Part III – Loops are Evil

Be sure to read these first two articles:

    Practical Functional C# - Part I

    Practical Functional C# - Part II

OK let's start with a quick challenge. Write an accurate description of the following program, in English:

static void Main(string[] args)
{
    string output = "";
    if (args.Length > 0) output = args[0]; 
    if (args.Length > 1) {
        for (int i = 1; i < args.Length; i++) {
            output += ", " + args[i]; 
        }
    }
    Console.WriteLine(output);
}

Well? The specification might be something like "a program that writes its arguments separated by a comma and space". But how quickly could you determine that from the code? How much additional time does it take to determine there aren't any bugs? Every time a maintenance developer comes across this code, they have to analyze this code, determine the boundary conditions correctly, verify the indexing, and so on. Every time someone reads this code, she must pay a high tax. The saddest part is that this is an extremely common pattern.

Edit: As Chad Hower pointed out in the comments that you can remove the two if statements by performing a check inside the loop. That shortens it considerably and reduces some of the "tax" that has to be paid when you read it.

"Take this set of values and aggregate them into a single value". How many pieces of code do exactly this, but obscure it behind a for loop? Functional languages realize this and provide functions called "Fold" or "Reduce". Such functions take an accumulator function, apply it to every element in the list, then return the accumulator value when finished. C# 3.0, courtesy of LINQ, provides an equivalent function, called "Aggregate":

static void Main(string[] args)
{
    string output = args.DefaultIfEmpty("").Aggregate(
        (accum, item) => accum += ", " + item); 
    Console.WriteLine(output);
}

Here we are saying that we are going to aggregate args into a single string value. The lambda on the second line takes two arguments. The first is the accumulator and Aggregate passes it to each element ("threads" it through). The second parameter is the current item we are working with. The return value of our lambda is simply the concatenation of the current accumulated value, comma and space, and the current item. This return value becomes the accumulator for the next item. On the first execution, since we did not give it an explicit seed value, it just uses the first item. The final return value becomes the value that Aggregate returns to output. (Edited: Added DefaultIfEmpty -- this overload of Aggregate doesn't work on empty sequences.) Using the lambda provides nicer syntax than this equivalent code:

static void Main(string[] args)

    string output = args.DefaultIfEmpty("").Aggregate(joinCommaSpace); 
    Console.WriteLine(output);
}
static string joinCommaSpace(string a, string b)

    return a + ", " + b;
}

So why is this a good thing? Well, it goes back to the questions about the first set of code: how much effort is required to determine intent and correctness of a particular piece of code? In the imperative way, we need eight lines of code, with three distinct paths. Using a functional approach, we have three lines of code and only one code path. The only serious objection that I've heard is that this code is "unfamiliar". Well, sure, anything new might be unfamiliar, but that does not make it bad.

You wouldn't write your SQL code to make someone only familiar with C "comfortable" or "familiar", would you? You wouldn't write in C# the same way you'd write in C or BASIC. So why stick with outdated programming practices just because some "new" developer might get confused? Functional code is more concise, less error prone, and much more readable. Learning this style might take a couple of days, but it's an invaluable skill. At any rate, LINQ is built upon these concepts, so it will do people good to learn anyways.

Now, let's examine how to create our own functions to hide loops. This time, we're going to look at data access. A very common pattern in data access is creating a SqlDataReader, going through it, adding elements to a list. Like other patterns, overhead obscures the intent:

public static List<Person> GetAllPeople()
{
    // Setup command
    var comm = new SqlCommand("GetAllPeople");
    comm.CommandType = CommandType.StoredProcedure;

    // Setup connection
    using (var conn = new SqlConnection(Settings.ConnectionString)) {
        comm.Connection = conn; 
        conn.Open();

        // Loop and add people to our list
        using (var reader = comm.ExecuteReader()) {
            var people = new List<Person>();
            while (reader.Read()) {
                var p = new Person();
                p.Name = reader.GetString(0); 
                p.Age = reader.GetInt32(1); 
                people.Add(p); 
            }
            // Done
            return people; 
        }
    }
}

Yes, something as simple as reading a list of two-field type can take 14 lines of code. Let's do something about that. The only unique part is where we create a Person from the SqlDataReader. Outside of that, it's a very straightforward, but large, pattern. Refactoring the pattern, we get this:

public static List<T> ListFromReader<T>(string connectionString,
SqlCommand command, Func<SqlDataReader, T> code)
{
    // Setup connection
    using (var conn = new SqlConnection(connectionString)) {
        command.Connection = conn; 
        conn.Open();

        // Loop into list
        using (var reader = command.ExecuteReader()) {
            var list = new List<T>();
            while (reader.Read()) {
                // Here we call the supplied code to add the right item
                T item = code(reader); 
                list.Add(item); 
            }
            return list; 
        }
    }
}

Now our code to GetAllPeople is very simple:

public static List<Person> GetAllPeople2()
{
    var comm = new SqlCommand("GetAllPeople");
    comm.CommandType = CommandType.StoredProcedure;

    return ListFromReader(Settings.ConnectionString, comm, 
        reader => new Person {
            Name = reader.GetString(0), 
            Age = reader.GetInt32(1) 
        });
}

Just be looking at this code, we know all the data-API stuff is handled correctly. This approach is vastly superior to other common SQL "helpers", for example, an ExecuteReader method that gives us a SqlDataReader. First, we have no locals that need disposing or other cleanup: this function scopes the variables to our lambda. Second, we can focus on our actual logic (creating a Person) rather than dealing with loop conditions.

Edit: This is not specific to C# 3.0! You can definately achieve a lot of the same benefits in 2.0, except you need to replace the simple lambda syntax ( => ) with the much more verbose delegate (ArgType arg) { } (anonymous method) syntax. C# 3.0 just makes it much easier to write.

What other common loops do you encounter? I have a few more we'll cover in the next article. As always, comments, insults and suggestions are welcome.

Code
Thursday, August 16, 2007 4:36:01 PM UTC  #    Comments [21]  |  Trackback

# Monday, August 13, 2007
Practical Functional C# - Part II

Previous article: Practical Functional C# - Part I

Last time I demonstrated how to replicate the using keyword as a function that takes a function. In this article, I'm going to show some real-world cases where you'll see a major improvement by taking a more functional approach. I'm going to use WCF as an example space.

WCF allows us to define services as normal C# interfaces. We then use a factory to create a proxy for an interface, specifying the URI, binding types (HTTP, binary, message queue), and other options. However, our interest is how this actually looks from the client side; how we actually make calls.

Since a WCF call is actually invoking code on another machine, any number of bad things can happen to our client channel, resulting in exceptions. When this happens, we have to Abort the channel. But if things complete successfully, we just need to Close the channel. Of course, that's not easy enough: if the Close fails, we then need to Abort the channel anyways. Confused? Well, check out this code snippet that uses an interface called ICalculator to add two numbers:

void WcfExample()
{
    int a = 1; 
    int b = 2; 
    int sum; 
    var chanFactory = GetCachedFactory<ICalculator>();
    ICalculator calc = chanFactory.CreateChannel();
    bool error = true
    try {
        sum = calc.Add(a, b); 
        ((IClientChannel)calc).Close();
        error = false
    }
    finally {
        if (error) {
            ((IClientChannel)calc).Abort();
        }
    }
    Console.WriteLine(sum);
}

Ouch! Out of 17 lines of code, only five relate to our problem. The rest (70%) is pure, ugly overhead. This is a much more complicated pattern than the using pattern, and from experience, I can tell you it is error-prone. Fortunately, this pattern decomposes nicely. The only two unique things are the name of the interface and the statement that acts on the interface. Here is one way you might go about writing this generically:

TReturn UseService<TChannel, TReturn>(Func<TChannel, TReturn> code)
{
    var chanFactory = GetCachedFactory<TChannel>();
    TChannel channel = chanFactory.CreateChannel();
    bool error = true
    try {
        TReturn result = code(channel); 
        ((IClientChannel)channel).Close();
        error = false
        return result; 
    }
    finally {
        if (error) {
            ((IClientChannel)channel).Abort();
        }
    }
}

This is exactly like the previous code, but we've substituted TChannel and a function parameter named code instead of our actual types. The type of function we want is Func<TChannel, TReturn>. You can think of this as saying: "transforms a TChannel into a TReturn". Now, look at the beauty this allows on the client side:

void WcfExample2()
{
    int a = 1; 
    int b = 2; 
    int sum = UseService((ICalculator calc) => 
        calc.Add(a, b)); 
    Console.WriteLine(sum);
}

Presto! Our overhead went from 12 lines to zero. If that didn't sell you on the power of functions, I suggest finding another career. Now, there is a lambda there, but all it is saying is "Here is a function that takes one parameter of type ICalculator and returns the value of Add(a, b)." The C# compiler automatically infers the return type (int).

Some of you mentioned still using C# 2.0. Two things: First, you can use the 3.0 compiler and run on 2.0 because these examples don't reference any new assemblies. Second, you can achieve the same thing using anonymous methods, but they just look a bit uglier:

void WcfExample3()
{
    // C# 2.0 example with anonymous methods
    int a = 1; 
    int b = 2; 
    int sum = UseService<ICalculator, int>(delegate(ICalculator calc) {
        return calc.Add(a, b); 
    });
    Console.WriteLine(sum);
}

By this time, I hope you are thinking about places in your code where you can extract a larger pattern and use functions instead. The key concept to understand here is that your source code should reflect your solution. You should not have extra code sitting around just for the sake of appeasing the platform or meeting some "design pattern". Let the platform and libraries take care of the details, and let your code focus on actually solving problems. As always, I welcome your comments, suggestions, and insults, so please let me know what you think!

Next up, we'll take a look at loops, see why they are harmful, and then go about fixing them.

Code
Monday, August 13, 2007 11:07:26 PM UTC  #    Comments [8]  |  Trackback

# Sunday, August 12, 2007
Practical Functional C# - Part I

Edit: This first article might be a bit dense at times. However, I promise, if you stick through and read at least until Parts II and III (Loops are Evil) you will see major benefits that will totally transform your code (even C# 2.0 code!). In fact, to the first 3 people that can show me that these practices will NOT result in better code in many enterprise apps, I'll give $100 each. Post a blog comment or email me (mgg AT telefinity dot com).

Redundancy in source code is a common degeneration. But for the C# programmer, her weapons to eliminate it have been unwieldy at best. She has been able to eliminate simple, blocky, patterns, but truly writing reusable code at a fine level has not been easy.

This series will demonstrate how you can take advantage of functional programming (FP) in your work, today. The first task is to dispense with the notion that functional programming is difficult and strange. Most FP articles start with a recursive definition of the Fibonacci sequence and then talk about currying functions. Here, we're going to start off with something that every C# programmer has used many times. I can guarantee that after you absorb this series of articles, you'll be writing more concise, less buggy, more powerful software. I've seen many C# modules get cut down drastically in size. I've seen new C# code written that does multithreading and it was written correctly on the first go. I want you to see these things too.

Refactoring out common blocks of code is familiar. Probably every developer has written a helper method to initialize a database command or prepare a commonly used object. Unfortunately, refactoring only tends to happen to contiguous blocks. We do not see refactoring of complex patterns that do not fit into neat blocks. Consider the following visualizations of a program. The red blocks are the unique parts of the program, and the grey ones represent common statements.


                                             becomes

It is easy to refactor the first sequence: move the common parts into another function and call it. However, the second pattern poses quite a problem. The individual "common" blocks are too simple to move into another method. The call to the refactor method would be the same as the method itself! How can we refactor it correctly?

Well, C# itself contains keywords that refactor some of these patterns. Consider the using keyword. The following two methods are nearly equivalent:

void usingPatternExample()
{
    string result;
    StreamReader sr = new StreamReader(filename);
    try {
        result = sr.ReadToEnd();
    }
    finally {
        if (sr != null) sr.Dispose();
    }
}

void
usingKeywordExample()
{
    string result; 
    using (StreamReader sr = new StreamReader(filename)) {
        result = sr.ReadToEnd();
    }
}

I think it is obvious why everyone uses the using keyword over writing out this long init-try-something-finally-dispose pattern repeatedly. When we read code that uses the using keyword, the intent is instantly clear. We don't need to go validate the pattern to make sure it does what we think it does, we can just see "using" and know that it is correct. As an extra benefit, we can declare our disposable variable inside the using scope, so that we don't accidentally use it after it is disposed.

This is all good and dandy, until we realize that we are stuck with the handful of patterns defined as C# keywords! If I wanted to declare the variable result as an output of the using statement, I'm out of luck. If I want to initialize multiple objects in one using block, then that is just too bad. Part of the reason for this is that it was extremely ugly to do this in C# before version 3.0. Consider this C# 1.0 example to recreate the using keyword as a user-defined function:

delegate void UsingAction(IDisposable obj);
void Using(IDisposable obj, UsingAction action)
{
    try { action(obj); }
    finally { if (obj != null) obj.Dispose(); }
}

So far, so good – nothing ugly yet. The usage of this user-defined Using function on the other hand:

string result;
void UsingExample1()
{
    Using(new StreamReader(filename), new UsingAction(readIntoResult));
}
void readIntoResult(IDisposable obj)
{
    StreamReader sr = (StreamReader)obj; 
    result = sr.ReadToEnd();
}

Atrocious! The pattern of the code is shattered and complicated to follow. The main part of the code (getting the result) is forced to sit in a separate method many lines away. We're required to use fields to pass data in or out. And just to top it off, lack of generics forces a cast – we had to write StreamReader three times. The utter syntax hides our intention. In short, it is completely unusable.

C# 2.0 makes some progress. First, we can change UsingAction to Action<T> and allow the caller to specify the type. All that is needed is a constraint on the type to IDisposable. The new declaration of Using looks like this:

void Using<T>(T obj, Action<T> action) where T : IDisposable

This says that we need something of type "T", and the only restriction is that T must be a type that implements IDisposable. Then, we want an Action that works on that same type T, whatever it is. By using T instead of a specific type, we are essentially saying "we don't care about what type your object is; our function works with all sorts of types". This is the source of the name "generics". Our code is more generic; it's not tied to a specific type. Generics are an extremely important tool for refactoring patterns because patterns occur across different types.

To clean up the calling code, C# 2.0 has a feature called anonymous methods. They are exactly as they sound: methods without names. They can be declared inline, right along with the rest of the code. Additionally, they "capture" local variables, making it unnecessary to declare fields to pass data in and out.

void UsingExample2()
{
    string result; 
    Using(new StreamReader(filename), delegate(StreamReader sr) {
        result = sr.ReadToEnd();
    });
}

What a major improvement! C# 2 can infer the type of T for Using, avoiding having to write Using<StreamReader>. Apart from having to type StreamReader twice, this code is starting to approach the level of clarity that the using keyword provides.

C# 3.0 introduces lambdas. Now, that's usually a scary word from functional programming, but it's actually very simple. For C#, it's just an easier way of writing anonymous methods. Instead of having to write "delegate(ParamType paramName) { }", we can just do "paramName =>". That's all there is to it! Nothing scary at all. For example, if I want to write a lambda to add two numbers, I’d write it like this:

Func<int, int, int> add = (int a, int b) => a + b;

The part Func<int, int, int> says we have a function that takes two ints and returns an int. Next, we declare our two integer parameters: (int a, int b). Then, we define the lambda body by using the => operator. Our body consists only of “a + b”. With lambdas, if we only have one statement, we don’t need to explicitly use braces or the return keyword. Alternatively, we could have written it as:  

Func<int, int, int> add = (int a, int b) => { return a + b; };

Behind the scenes, the compiler goes and creates a method and creates the Func delegate on top of it. So, armed with this new syntax and power, let’s take another look at our Using method:

void UsingExample3()
{
  string result;
  Using(new StreamReader(file), sr =>
    result = sr.ReadToEnd()
  );
}

void usingKeywordExample()
{
  string result;
  using(var sr = new StreamReader(file)) {
    result = sr.ReadToEnd();
  }
}

This shows there is nothing that special about the built-in keywords. It is also the first step in demonstrating that some parts of functional programming are not very foreign to the C# programmer. The next articles in this series will provide some real-world application of these ideas and show how you can reduce the amount of code you have to write (and reduce the number of bugs at the same time!).

Code
Sunday, August 12, 2007 11:21:32 PM UTC  #    Comments [5]  |  Trackback

# Thursday, August 02, 2007
C# Frustration

C# with lambda syntax and extension methods (in lieu of compositional operators) gets us so far, but the syntax and compiler could use a bit of polish. I'll show some exact examples later, but meanwhile this picture sums up my feelings at the moment:

Notype1.png

Edited to add: Inspired by http://xkcd.com/297/

Edited to add: Just to be clear, this isn't a compiler or IDE issue: it's a spec issue (AFAIK). The C# spec simply doesn't allow certain things (like type inference from a method group).

Code | Humour
Thursday, August 02, 2007 7:39:10 PM UTC  #    Comments [7]  |  Trackback

# Tuesday, March 13, 2007
Outsourced Evolutionary Programming

My friend just dealt with an oursourced project. Yes, outsourced as in sending it to a place that charges a lot less money than, say, developers who actually know what they're doing.

One of their deliverables was a program that compressed an XML string into a gzip file. Should be a minor thing, right? The C#/.NET 2 code is less than 10 lines. Well, their first delivery produced files that contained the text "System.Byte[]". This was not accepted and they vowed to look into it in more as they were sure the code was correct.

Their next files were a bit larger and supposedly were really correct this time. But still, no zip program could read the data. Well, a quick look at the beginning of the file shows the bytes EF BB BF -- the UTF-8 BOM. The rest of the file was ASCII digits. Yes, they wrote the bytestream out as a UTF-8 interpretation.

If we define evolution as "the non-random survival of randomly mutating replicators", we can define their approach as "the non-random acceptance of randomly mutating programs."

Code | Humour
Tuesday, March 13, 2007 5:51:08 PM UTC  #    Comments [2]  |  Trackback

# Friday, May 05, 2006
SQL Server 2005 Reporting Services Configuration Madness

Well, after almost exactly 6 hours, I've succeeded at installing SQL Server 2005 Reporting Services on a server with more than one website.

We're running Reporting Services on separate web servers. So, after the install of reporting services, you run their little configuration tool. This of course, accomplishes very little :). See, apparently Reporting Services wasn't designed to work on a server running, *gasp*, more than one application.

If you have a decent IIS install, the default website isn't there and thus requests to http://localhost/ aren't gonna work. Reporting Services doesn't take this into consideration, and happily tries to request http://localhost/ReportServer/ even after you've specified this in the config tool. If this is your issue, you'll get a “HTTP Error 400: Bad Request“ when trying to access the ReportManager (/Reports/) website.

You'll need to edit the config files in Program Files\.....\ReportManager and ReportServer. rsreportserver.config needs to point to http://the.reporting.host.name/ReportServer in the UrlRoot element. In RSWebApplication.config, ReportServerUrl will need to have the same value. The ReportServerVirtualDirectory element must be deleted. You will get a “The configuration file contains an element that is not valid. The ReportServerUrl element is not a configuration file element. “ message. This is because the config reading code apparently doesn't fail gracefully. What it's trying to say is “the ReportServerUrl and ReportServerVirtualDirectory elements are mutually exclusive”. I'm still unsure why there should be anything besides a URL...

Around here, you might notice a bunch of DCOM errors in your Event Log (or before this point). To fix these, you'll need to go into dcomcnfg and edit the COM security for My Computer. Give the account you're using (like Network Service or “MyReportingServicesAccount“) permissions for local activation and local launch. You need to reboot for these changes to take effect (I think). But don't reboot just yet...

Finally, you end up with a 401 Unauthorized when accessing the Reports site. You might have also noticed you are also unable to authenticate when browsing the Reports or ReportServer sites from your the local server. Why?
“Windows XP SP2 and Windows Server 2003 SP1 include a loopback check security feature that is designed to help prevent reflection attacks on your computer. Therefore, authentication fails if the FQDN or the custom host header that you use does not match the local computer name.” So I'm guessing NTLM susceptible to this type of attack, and Microsoft is saving us from it. Well, it also hoses us in this case because from what I can tell, ReportManager (the thing in the Reports vdir -- why it wasn't called ReportManager by default...) needs to connect to ReportServer. It sends a request, which is denied because of the loopback protection above. A quick registry edit fixes this: http://support.microsoft.com/default.aspx?scid=kb;en-us;896861

After that... you might have a working SQL Reporting Services 2005 install! (Next up: Getting it to work with SSL...)

Really, apart from the horrible setup/configuration, it's a very very fine product. I'm actually pretty impressed. The report I wanted to setup (and the subscription so it's mailed out) only took about 10 minutes (first time I've ever used RS)! I'm just at a loss why Microsoft makes it so hard to setup. This configuration can't be that unusual. And, even stranger, most (if not all) of this configuration issues could take care of these problems. In other words, their little configuration app should automatically fix this stuff (or at least give explicit instructions on how to do so). Or maybe I just didn't RTFM that well... but this is a Microsoft product... you're supposed to just shove the DVD in the drive and click next, right? <g>

P.S., if you're getting a “Object Reference not set to an Instance of an Object“ when you add a new subscription, ensuring everything else is 100% working should make it go away...

Code | Misc. Technology | Security
Friday, May 05, 2006 6:02:44 AM UTC  #    Comments [8]  |  Trackback

# Tuesday, December 27, 2005
Microsoft finally realises VS2005 web site apps suck

http://webproject.scottgu.com/Default.aspx

YEY!!! Finally, after 2 years, we get VS2003 functionality back in VS2005. The biggest pain point I have, every single day, is dealing with the vile VS2005 web sites. Microsoft has finally realised that this monstrosity spawned to soothe the demented minds of webmonkies who think HTML is a programming language is actually bad for real developers. Really, how does catering to the people who think “build“  refers to writing code help them? (Oh wait, I know. It allows them to gain access to “web developers“ while knowing that people who know what they are doing still won't defect.)

This is exactly like I predicted -- Microsoft will have to back down. I guess it took the final RTM launch for everyone to try to upgrade their apps and then find out that the new model is unusable. I wonder how many PSS cases/pieces of feedback they got.

Well, I can't wait to change our projects to this format. I was planning on restructuring our solution (22 projects in VS is unwieldly) anyways, and this will go great with it. I know a few developers who are gonna be real happy when they come back after the holidays! Maybe this means that doing a public refactoring in our solution won't take 45 seconds anymore? Maybe these apps will build in less than a minute (like every other C# app)?

Oh, BTW, I'm far from ungrateful, even though I might sound like that. I'm actually very happy that the ASP.NET team is doing this, despite the fact that I might roll my eyes and say “yea, about time!” :). But after going through all the pain I have... perhaps its understandable.

Code
Tuesday, December 27, 2005 2:25:54 AM UTC  #    Comments [1]  |  Trackback

# Tuesday, August 02, 2005
Best way to traverse all controls on an ASP.NET page?

I was working on an application today, and I needed to add some data to every HyperLink on the ASP.NET page (for a custom authorization string). I thought it might be a common thing: needing to go through all the controls on a page, but apparently not. I didn't find any framework functionality, and the only code samples (just to see if I have the “best” way of doing things) led to some not-so-nice code (arraylists and recursion!). So, here's the best I've come up with (criticism, please):

        Stack<Control> remainingControls = new Stack<Control>();
        remainingControls.Push(this);
        do
        {
            Control currentControl = remainingControls.Pop();
            foreach (Control item in currentControl.Controls)
            {
                if (item is HyperLink)
                {
                    HyperLink hl = (HyperLink)item;
                    hl.NavigateUrl = AddAuthToUrl(hl.NavigateUrl);
                }
                else if (item.Controls.Count > 0)
                {
                    remainingControls.Push(item);
                }
            }
        } while (remainingControls.Count != 0);

Code
Tuesday, August 02, 2005 9:52:15 PM UTC  #    Comments [3]  |  Trackback

# Wednesday, July 27, 2005
Secure TCP Remoting in Whidbey

I've spent a few hours trying to get the secure TCP (based on NegotiateStream) integrated security in .NET 2.0 working. While there is a page on this (Authentication with the TCP Channel), it fails to mention that you need one more property in addition to encrypt, impersonationLevel and authenticationMode. It's called “secure”, and it must be “true”. I didn't see it mentioned anywhere, except when I happened to browse the MSDN Forums: http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=55225

I looked at his config, and realised I didn't have this “secure” property. Problem solved. Also, I recommend checking out http://pluralsight.com/wiki/default.aspx/Keith.GuideBook/HowToAddCIAToDotNETRemoting.html, which has a lot of information about Windows security in general, apart from some specifics of remoting and Kerberos. And, finally, yes, there's one more page where the secure attribute is listed (with some other docs) http://blogs.msdn.com/manishg/archive/2005/04/22/410879.aspx

OK, so perhaps there was some error between the user and the keyboard... but I'm very very excited to see this feature running.

Code | Security
Wednesday, July 27, 2005 2:25:15 AM UTC  #    Comments [1]  |  Trackback

# Friday, July 08, 2005
Using Asterisk from C#: MONO-TONE

I found a cool C# library for use with Asterisk (AGI) and .NET: MONO-TONE. It looks promising as an easy way to deal with AGI from C#. I think that I'll be extending it to support FastAGI as well, and contributing the changes back. Nice work Gabriel!

Asterisk | Code
Friday, July 08, 2005 4:06:36 PM UTC  #    Comments [2]  |  Trackback

# Tuesday, April 05, 2005
Visual Studio 2005 Call Browser

I just used the coolest feature in a while: VS2005's Call Browser.

I'm currently working on some firmware for IP phones and adapters. The chip is the Intel 8051, as used by Centrality Communications in the PA168. It's actually quite fun, programming for an 8-bit system. Apart from writing in C (instead of such high-level things like C#), writing for embedded systems like this adds its own interesting things, like having to decide where a variable will be stored (I think there is 3 different storage locations a variable can have with the 8051). Oh yea, and having to keep things really small, and really fast.

At any rate, I am not that familiar with the entire design of the system, and I just want to focus on adding features to the IAX2 implementation (cause SIP sucks!). A large part of my work is to figure out how things work. Having the Goto Definition (F12) is great for finding specific symbols, but doesn't help with the flow. Up until now, I've been Finding in Files for a specific method, then chasing things down by recursively Finding in Files until I figured out how things are called. This happens a lot, since these devices support 5 protocols and use #ifdefs and generic calls to interface with the different protocols. Add to that 8MB of source, and it's no small task.

This morning, I remembered I had seen a “Call Browser” window in Visual Studio 2005 a while ago. Edit: Apparently this isn't a new feature and has been around since VC6 (at least). Well, it's new to ME, and it's still very cool.

Here's an example. Let's say I want to add attended transfer (where you have a call, press transfer, dial a number, talk to the new call, then hangup to connect the two). I'm looking in the source I'm familar with (the IAX protocol area), and see iax2_hangup(). That's a packet-level call, so when someone physically hangs up, that, somehow, gets called. Where? Well... right click the function, Call Browser -> Show Calls To:


Click... click... click... bingo. Now I've got a complete grasp on the call flow that would send a IAX_COMMAND_HANGUP. My old way (which makes me feel stupid now) of browsing source doesn't even come close. A lot of programming these days is managing complexity. It's all about making the best use of our limited brainspace (some more limited than others). 17 lines/1 small diagram. That's all it takes for me to see this complete flow. How much mental capacity does that require versus browsing multiple locations in 4 different source code files?

Code
Tuesday, April 05, 2005 5:08:28 PM UTC  #    Comments [5]  |  Trackback

# Thursday, March 31, 2005
Cracking code 5.1: Increasing your configuration
Yet another super-easy tutorial... (Revision 2 for legal reasons)

When attacking code, always look for the smallest, least intrusive change. The more you change, the more you have to worry about A: screwing something up and B: not being able to move your changes forward when the emitted code changes. Sometimes copy protection authors use encryption and likewise. Sometimes they even do it correctly. But many times, the critical path of code comes down to a single bit or couple of bytes.

I've talked about flipping branches (jumps) before. Some programs all boil down to an "if(boolean)...", in which case flipping a bit of a jmp will reverse the condition (jump if equal to jump if not equal). This results in the code always working when you enter invalid input, and not accepting valid input. But more complex code might actually depend on a bit more code, say, a variable being set to a certain number. For instance, maybe it has an "activation level", and the higher it is, the more features are enabled. In such cases, it's not feasible to go around flipping a bunch of branches.

Today's tutorial will use IDA Pro (www.datarescue.com). You can get a free demo to try out. However, if you're gonna do a lot of work with IDA Pro, it's only $439 for the full version. It even now supports cross-platform debugging (i.e., debug your Linux app on Windows), and supports .NET executables. I have to try it, but it sounds like this could be my solution to developing (debugging) on Windows for Linux. Very, very cool.

No sample program this time, since it is really easy to grasp. Lets take a theoretical program: MagicLineConverter. MagicLineConverter converts input data to output data and does some magical transformation on it. The program is configured for a set number of lines. So you can buy a 1000 line program, a 2000 line program, etc. They have some genius crypto people on staff, so trying to generate fake config files for it just isn't possible. You need to try it with a million lines, just to make sure it works, so you can get a purchase order to buy the program. So, you download the demo program, but it expires before you get a change to examine it. Now, you have zero lines configured for use.

Thus, we load the program in IDA Pro. After loading the program, you'll get a large disassembly view. Poke around, and you'll see names like “sub_8048400” and “dword_804967C”. As with any attack, you've got to start off by finding the real method and variable names. IDA Pro makes this not too hard, and offers a renaming function that allows you to rename functions as you go along. Thus, if you think a variable holds a value representing if there is network access, you can rename it to say "IsNetworkAvail" instead of remembering a memory location. If you work around for a while, you can probably reconstruct a lot of the program logic. The more you understand, the better your patch can be.

Well, when you run the program, output like this is probably sent:
Configured for 1000 lines.

Back in IDA Pro, goto the strings window. Search for that string. Double click it and you'll see something like this in the dissembler: ".data:001234 'aConfigured_for', 0Ah, 0. On the next lines, you'll have information like "; DATA XREF: sub_001400+E". IDA is telling us where this string is referenced. If we go there, we'll probably see something like this:

push ds:dword_0A240200
push offset aConfigured_for ; "Configured for %d lines.\n"
call printf

By now, we're probably almost done. We've found where some code is that reports the total lines the program is configured for. Somehow, this routine knows where to get the data, or the data is passed in. Since there's a dword being pushed and printed, it's safe to assume the count is stored there. Click that dword, and press 'n' to rename it. Enter a good name, such as 'possibleCount' or 'printedCount'. When the copy protection is good, there could be multiple levels of indirection leading up to printing something critical like that. Thus, using tentative names that reflect what you are certain of helps if things get more complex down the road. You can also rename the routine to something useful like "printCount".

Now, we want to see whereelse this variable is used. IDA Pro has a feature that lets us see all references an item. In the disassembly, right click our renamed variable, and select “Jump to xref to operand” (or just press x). A dialog is shown that has different instructions using this memory. Look for ones that look like initialization. Here's two common examples:

mov ds:printedCount, 0
mov ds:printedCount, ecx


The first one first. Highlight that entire line (mov ds:printedCount, 0). Then switch to Hex view. You'll see something like this highlighted: C7 05 34 12 00 00 00 00 00 00. Since it's a dword, there is 4 bytes to represent zero. Modify any of them to a value of your choosing. (thus changing mov ds:printedCount,0 to ,1000000). This patch can be as small as a single bit if your choose!

But wait... sometimes GCC won't generate a “mov something,0” to initialize it. In some cases, for some processors, and certain optimizations, it'll use a register for initialization. In such case, the disassembly might look something like the second case:

; Somewhere deep in the program
mov ds:printedCount, ecx ; After critical processing


Now we have to find out where ecx is initialized. It probably won't be too far away. If we're lucky, there will be a mov ecx, 0. However, optimized code probably won't emit that. Instead, it might have:

xor ecx, ecx

xor'ing a value against itself will always produce zero, and “xor ecx, ecx“ takes up 3 less bytes than “mov ecx, 0“. The xor is only two bytes (0x31c9). Two ideas: First, fill it with nops. Depending on the value in edx, this might work and give us some amount of licenses. However, that might not work: ecx could be zero already. Fortunately, we can address a single byte of ecx with this: mov ch,0xff. This moves ff into the high part of CX, which is the low part of ECX. That instruction generates only two bytes (0xb5ff), so it's a great replacement for the xor opcode on the same register. Assuming ECX is zero, that one byte will now make it have the value 65,280.

In both cases, it's only a two byte patch. You can distribute the patch with a simple offset:value -- 9 bytes of ASCII text. Sorta hard to stop that, and anyone could patch just from their own memory.

Moral of the story: Write obfuscated code or use a post-compile processors that will mixup your code for you. If your code is cracked by changing a single bit... that means it's just protecting the honest :). While 100% protection is never possible, it should be a lot harder than allowing a stray gamma ray to crack your code!
Code | Security
Thursday, March 31, 2005 4:26:10 AM UTC  #    Comments [1]  |  Trackback

# Wednesday, March 16, 2005
SOAP Performance (gSOAP / ASP.NET)

I'm doing my own realtime support for Asterisk, in an attempt to make it scale. Asterisk is nice software, but straight out-of-CVS, the performance for high volume (say, over 20,000 clients) sucks. There are also other inconviences with using a file-based store to determine how to route calls. Mainly, it's inflexible and hard to achieve high-perf when everything is based on a large .conf file. Not to mention that Asterisk uses linked lists for everything so looking up any user is an O(N) op (and parsing the users file is O(N*N) by default!). So, I'm going to put my own logic as a replacement for some of the critical parts.

One of my concerns was performance. Since I'll have multiple Asterisk clusters banging on my .NET code (via SOAP), I wanted to ensure the whole end-to-end process was fast enough.

I used gSOAP to create the C code on Linux. gSOAP is seriously nice. At least an order of magnitude easier to use than I expected any SOAP library that works on Linux would be.

I created a simple test. I made a database with phone numbers and codecs. The idea is that when an incoming call comes in, Asterisk will use my code to SOAP over to my Windows machines, get the data, and then go on its merry way.

My Asterisk machine is a P4 2.4GHz, 512MB RAM (but, I have a Gnome session running on it). My Windows XP machine (I tested against my desktop) is a P4 3GHz HT, 1.5GB RAM. I'm running ASP.NET 2 Beta 1 and SQL Server 2000.

The test program consists of a loop (count 5000) that generates a random number, then uses gSOAP to ask for the codec for that number. Simple, tight.

The results on Linux are particularly impressive. Each instance of the test app only used a max of 4% processor, and under 1MB of RAM. The bottleneck was definately inside ASP.NET. To simulate more load from other machines in a cluster, I ran 1, 2, and 4 instances of the test program. Also note that background tasks on the XP machine used up about 10% of the CPU.

Results:
 Single process (5000 total requests): 
  Total time:                   18 seconds (0.0036s/request)
  Requests per second:   277
  ASP.NET/IIS CPU:          30%
  SQL Server CPU:            4%

 Dual process (10,000 total requests):
  Total time:                   23 seconds (0.0048s/request)
  Requests per second:   384
  ASP.NET/IIS CPU:          60%
  SQL Server CPU:            7%

 Quad process (20,000 total requests):
  Total time:                   42 seconds (0.0052s/request)
  Requests per second:   476
  ASP.NET/IIS CPU:          80%
  SQL Server CPU:           10%

These results are encouraging enough that I'm not worried of the performance impact of using SOAP with Asterisk. My target was to have a response in less than 0.1 seconds. Although, anything under 0.5s would be quite unnoticable to a client. Even in tests with more threads, my single request response time was always way under 0.1 seconds.

Also, as far as I know, Whidbey Beta 2 (the version I'll go live with) makes some performance improvements. And also, IIS6 on Windows 2003 is much faster than IIS5.1 on XP. At any rate, a single proc desktop machine serving 476 RPS? That's pretty damn good perf if you ask me!

ast_mono | Asterisk | Code
Wednesday, March 16, 2005 5:06:55 PM UTC  #    Comments [1]  |  Trackback

# Saturday, February 26, 2005
Smart change to TransactionScope

In .NET 2, there's a new System.Transactions.TransactionScope class. It basically allows you to do implicit transactions just by creating a new TransactionScope. It's stored in TLS and things like SqlConnection check it and auto-enlist. A sample:

using (TransactionScope txScope = new TransactionScope)
{
insertSomethingIntoDB();
processCreditCard();
txScope.Complete();
}

This is different than Beta 1. In Beta 1, you had to set txScope.Consistent = true (it was implicitally false). I feel this is a great change, as using a method for completing a transaction is a lot more intuitive than using a property. I bet a lot of people would have run into errors with the old behaviour. Now, it's quite clear. When you're ready to commit, just call Complete. If you don't want to commit, then call Dispose (implicit with the using block).
Code
Saturday, February 26, 2005 4:24:30 PM UTC  #    Comments [1]  |  Trackback

Uninstalling Visual Studio 2005 December CTP

Control Panel -> Add/Remove Programs
Remove VS.NET 2005, MSDN, J#, Device Emulator, etc., and .NET Framework 2.0.

Then delete VS8, 2.0 Framework folders and registry keys.

That seems to do it. Beta 1 installed, without rebooting, without complaining, right after that. Of course, maybe it'll blow up when I start working, so no guarantees. But it's sure a huge improvement than Beta 1's uninstall.

Code
Saturday, February 26, 2005 8:41:54 AM UTC  #    Comments [1]  |  Trackback

VS 2005 December CTP Feedback: Rollback to Beta 1

So, after about 6 hours of trying to install, I've gotten the VS 2005 December CTP installed. I can say that the December CTP has made a lot of progress. Some things are a lot faster (say, ASP.NET building). A lot of stuff feels unpolished (icons). Some things are silly: F7 (“View Code“) is broken... had to manually set it. At any rate, I'm gonna come out and look like a dumbass, since I'm now gonna spend n hours re-installing Beta 1 :P.

One kick ass thing is that the dialogs are FAST now. Before, it seemed like old Windows Forms: you could see things painting (the refactoring dialogs are a good example). Now, it seems like real Windows. There other code editor enhancements (I noticed some new error colours)

Some stuff is just unusable. Like, I don't know... say, building and viewing errors. For some reason, I had to build about 20 times to get through all the errors. And no, they were not errors that stopped a file from compiling that need to have a rebuild, nope. Just simple things dealing with ASP.NET.

I get obsolete warnings, saying I should move to other classes (ConfigurationManager)... but these classes don't exist. So there goes compiling with warnings as errors :). No big deal.

Typed Data Adapters got some changes. Typed data adapters now have Connection[String]? as a protected field (as far as I can tell), breaking my code, forcing me to do changes (subclass the adapter) for no reason, other than to annoy me. Yea, it's all one gigantic cosmic plan to screw up my project ;).

What the hell is the obession with naming a freaking connection when “designing“? Data adapters, web services, etc.: I wanna link all that up at runtime. But no, it insists on having me select a “connection“. Then it dumps it into an app.config (even for library projects). What ever happened to “the developer has some clue of what he's doing, so let him handle it“? I understand that script kiddies are customers too, and sometimes you just drag and drop and presto: a full data app. Hey, I write one-off code sometimes too.

ASP.NET is still in transition here. First, it bitches about having a bin and Application_Assemblies directory, forcing you to rename (since you can't delete the Application_Assemblies dir). Of course, they have now realised this was retarded and fixed it (called it bin) in future builds (Beta 2). They also went though another fit with the directives (CompilesWith, CodeFile, CodeBehind, Inherits, ClassName... wtf?). Fortunately, it looks like they're going a step in the right direction. Of course, since I had so much trouble even getting my project to build, I could be wrong. Even so, it tossed out my old project settings (since Web Projects aren't projects, they're just folders).

There's been a lot of work invested in making it more “Community” accessible. That's all fine and dandy, but I can't envision myself ever, ever, using any of those features. Perhaps for VS Express/Academic/I-learned-VBA-and-thus-am-an-Enterprise-dev versions it makes sense. Just not sure what place it has in “Enterprise Architect” version.

Of course, I was forewarned that the CTPs weren't good, and that Betas are real quality, etc. etc. But, hey, I like being hopeful. And it's a good glimpse of the future. Too bad I couldn't use it and file more reports against stuff. I'll have to wait until Beta 2. :(

Code
Saturday, February 26, 2005 6:59:02 AM UTC  #    Comments [0]  |  Trackback

# Friday, February 25, 2005
Help me reduce this program

This is probably gonna be a post where I end up looking like an idiot, but here goes...

I was playing around and wrote a small program to dump the video font table. I started at around 33 bytes, but want to get it as small as possible (to um, learn! :)). Here's what I have so far:

[BITS 16]
[ORG 100h]
[SECTION .text] 
start:
mov al,64   ; Init (but don't clear) video     
INT 10h      ; Need to call int10h to start NT's DOS video emulation I think... 
mov ax,VideoBuffer 
mov es,ax    ; Put video buffer segment into ES
mov ah,007h    ; White text (Attribute 7). AL is already zero from loading the video buffer
mov cl,255    ; All oem chars
charloop:
 stosw 
 inc ax ; Increase char, don't worry about the attribute; it's high
 loop charloop
ret
[SECTION .data] 
VideoBuffer EQU 0B800h

This assembles into:
00000000  B040              mov al,0x40
00000002  CD10              int 0x10
00000004  B800B8            mov ax,0xb800
00000007  8EC0              mov es,ax
00000009  B407              mov ah,0x7
0000000B  B1FF              mov cl,0xff
0000000D  AB                stosw
0000000E  40                inc ax
0000000F  E2FC              loop 0xd
00000011  C3                ret

For a total of 18 bytes. We can save 2 bytes by killing the mov ah,7h, but that's the video attribute, and the value that's in AH is B8, which is light grey on cyan. This looks ugly. We can also remove the first mov al and int10 call, but that means something else has to initialize the video, and that's cheating. (With those two optimizations, we're down to 12 bytes though.)
Anyone experienced want to teach me a lesson? Please? :)
Code
Friday, February 25, 2005 6:05:13 AM UTC  #    Comments [0]  |  Trackback

# Tuesday, February 22, 2005
Inline IL!

Mike Stall just released a cool tool for doing inline IL in C# (oh, and in VB). It's not full integration with the compiler (so it's not a _msil() block or anything), but it's still very cool.

Code | IL
Tuesday, February 22, 2005 2:45:39 PM UTC  #    Comments [0]  |  Trackback

# Saturday, February 12, 2005
Saved! ASP.NET team decides not to screw us up!

Doing some work on a new site using Whidbey, and I came across this:
http://msdn.microsoft.com/asp.net/whidbey/beta2update.aspx#ASP.NET_2.0_Compilation_Model_Changes

YES!!! Whoever says MS doesn't listen is definately wrong. Quick recap: ASP.NET changed it's project system/compilation model to better suit people who think HTML is a programming language. Good ASP.NET developers pushed back... hard. The “Web Platform and Tools“ team nicely listened. Yey!

The blessed article is a bit terse, so I've provided common-language translations (sarcasm and jest ahead... it's just because I'm so relieved, no offense intended):

“In response to significant customer feedback...”.
Translation: “We spoke with professional developers, instead of just going after the “PHP is teh r0x0rs” group, and the “I know HTML and thus am a 'Web Developer'” group.“

”The goal is to improve the code-behind and code-separation experience and enable the partial class paradigm to be used to improve the code-behind experience while continuing to maintain a syntax and functionality that is very similar to ASP.NET 1.x.“
Translation: “We fixed the compilation model.” 
Note: Wow, that's a really long sentence. And they even used the word “paradigm“. Wow.

“As a result, it makes upgrading of v1.x projects even easier and further reduces new ASP.NET 2.0 specific concepts.“
Translation: “Customers told us backwards compatability and migration was actually important.“
Minor correction: “upgrading of v1.x projects *possible* and further...“ (Yea, I had zero luck upgrading projects. Yet I could open VC++ 6 projects with VS2005 and compile and deploy to client systems with zero problems.)

“In short, this change enables developers to continue to pre-compile ASP.NET pages for significant performance gains while still being able to maintain the .aspx markup content separate from the binary.“
Translation: “Now things work like 1.1 again.“

Wow, this is really, really great news. I'm thrilled. Can't wait to get Beta 2, even if it means having to redo a nice amount of code.

Oh, BTW, they shortened the special directory (the Vile Code directory) prefix from “/Application_“ to “/app_“, and rightly canned “/Application_Assemblies“ (which was the “/bin“ replacement). Why “/Application_Assemblies“ was ever a good idea apart from consistency is beyond me...

Even so... something gives me the feeling that the compatability part of their team is somehow very much different from say, the Windows Shell team :).

Code | Humour
Saturday, February 12, 2005 9:21:52 AM UTC  #    Comments [0]  |  Trackback

# Wednesday, February 09, 2005
Mono is so nice

Out of all my experiences (heh) with Linux, Mono has to be the best one. It just works. I'm writing a set of web services to manage Asterisk. Things like adding users to the dialplan, configuring incoming numbers, voicemail, etc. I wrote a library to deal with the config files in VS2005, and tested on VS2005. Drag and drop, and bingo: it works just fine on Linux.

I just got a nice queued reload finished for Asterisk via my webservices. A little bit of threading code, and 40 lines later, it's done. Build, drag and drop (Samba), and presto. Works, smoothly. I just can't stress how cool it is to be able to work with MS tools, use shared code libraries with Windows, and then just drag and drop over the network and have it running right alongside with Asterisk.

We're doing a pretty ambitious project, the entire front-end on Windows, and Linux for everything to do with the voice. So far, everything's been a breeze, thanks to .NET. Not having to write in C, or ... PHP <shudder> ... is so nice. At any rate, we're hoping to launch by the end of the month. So if we do, I'll go into more detail on what things .NET let us just speed right through.

And already, I've made a new convert. We hired a guy who has worked with Java and PHP, never with .NET. You should of seen his face and heard his comments when I took him on a whirlwind tour of ASP.NET 2 and web services, adding in xcopy deployment to Linux in to boot. Wow :).

Asterisk | Code
Wednesday, February 09, 2005 6:57:52 AM UTC  #    Comments [1]  |  Trackback

# Friday, January 28, 2005
Inline x86 ASM in C#

One thing I had done before and decided to try again was inline (embedded? inline isn't the right term exactly) ASM with C#. Remember, the CLR JITs your IL code down to native code when it runs. There's no interpreter or likewise going on -- your C# code is x86 when it runs (on an x86 platform). However, when writing in C#, it's rather hard to get out to x86 directly. Probably the easiest way would be to use Managed C++ and an inline asm section there. But, if you want to keep it all in C# (say, you want something extra hard to decompile), you can achieve that.

[I must note, the more I learn of internals, the more I learn I need to learn more. Thus hopefully, some true expert will read this and give me more insight.]

The most straightforward way that occurred to me was to use a delegate. As far as I know, C# won't issue calli and ldftn IL opcodes for us in any way we can neatly control. There will be ldftn when a delegate is created, but we can't set that value directly. So instead, we'll create a delegate and modify it. Delegates have a private field named “_methodPtr”. This, as far as I can tell, points to the code to be executed by the delegate. It's important that our delegate is accurate regarding the number of parameters, and the return value.

We will store our x86 in a byte array. Then, we'll pin the array, and stick the address of the first element inside the delegate. When we call the delegate, everything will be set.

As far as I can tell, methods in the CLR use the fastcall convention, so the first two parameters will be in EDX and ECX. The return value is expected in EAX. My demo is going to be simple, performing a ROR (ROtate Right) by 1 on the parameter and returning that. 3 lines of ASM.

Compile with /unsafe obviously, else I'd be writing to secure@microsoft.com. I'm not sure how terribly useful this is, but it seemed cool to me. At the very minimum, it serves to tell people to STFU when they claim that C# / .NET can't do pointers, or raw code, or whatever.

using System;
using System.Reflection;

class Program
{
    public delegate uint Ret1ArgDelegate(uint arg1);
    static uint PlaceHolder1(uint arg1) { return 0; }
    
    public static byte[] asmBytes = new byte[]
        {        
0x89,0xD0, // MOV EAX,EDX
0xD1,0xC8, // ROR EAX,1
0xC3       // RET
        };
        
    unsafe static void Main(string[] args)
    {
        fixed(byte* startAddress = &asmBytes[0]) // Take the address of our x86 code
        {
            // Get the FieldInfo for "_methodPtr"
            Type delType = typeof(Delegate);
            FieldInfo _methodPtr = delType.GetField("_methodPtr", BindingFlags.NonPublic | BindingFlags.Instance);

            // Set our delegate to our x86 code
            Ret1ArgDelegate del = new Ret1ArgDelegate(PlaceHolder1);
            _methodPtr.SetValue(del, (IntPtr)startAddress);

            // Enjoy
            uint n = (uint)0xFFFFFFFC;
            n = del(n);
            Console.WriteLine("{0:x}", n);
        }
    }
}

Code | IL
Friday, January 28, 2005 7:15:12 PM UTC  #    Comments [13]  |  Trackback

# Sunday, January 16, 2005
My first open source contribution

I ran into an issue with Asterisk, mainly that you can't dynamically control which codec gets accepted. You have to make your choice “up front”, when you define a user/peer. This means, for example, if you want to say “for this call, use the GSM codec”, you can't. You've got to let Asterisk's code work things out, and even it works out on your side, the callee might decide to use a different codec anyways. This means that I end up declaring various peers: peerX-g729, peerX-ulaw, etc., and then have to swap them out when I call Dial.

Even worse, there's no easy way of completely avoiding transcoding when you want to. For instance, I have several phones connected to my server. Some use GSM, some use ULAW, some use G.729. They all use the same dialplan, and ulaw is usually negotiated for the termination. That means my little server gets nailed doing all this transcoding. This is even sillier when you realise that my termination provider has big hardware and will handle transcoding for me. So, without making a seriously complex dialplan, I'm stuck.

Well, IMO, that sucks.

So, I actually dove into the code, and patched it: http://bugs.digium.com/bug_view_page.php?bug_id=0003346 I've yet to see if this will get into the actual codebase. I sure hope so, since I *hate* forking. Indeed, that's one major criticism I have of the “you can just modify it to suit your needs“ claims of OSS. But, the ones in power seem quite rational, so there's some hope... maybe :).

Asterisk is a large project, but thanks to Visual C++ 2005, I could navigate it (New Project From Existing Code is very useful!). Unfortunately, I think there's a bug, as VS takes up 1.4GB of memory when editing this project. However, it's still quite responsive -- except for the Virtual Memory Warning from Windows, and the initial slowness, I'd never notice it was eating all that memory.

Code | Asterisk
Sunday, January 16, 2005 4:03:31 AM UTC  #    Comments [1]  |  Trackback

# Thursday, December 30, 2004
Grinding halt: VS 2005 + Mono + ASP.NET

Well, I guess my cross-platform development bliss had to come to an end sooner or later, right? I started work on a new app for Asterisk, and found that ASP.NET would come in handy.

Visual Studio 2003 requires IIS to work with web projects. Maybe I can trick it into using XSP, but I'm really, very, happy with VS 2005, and I don't want to go back. So, what are the problems with 2005? ASP.NET's new drug-induced compilation model.

Before, I could build my app with codebehind, compile, and go on my merry way. The DLLs are loaded at runtime, things are good, and most importantly, they work with Mono/XSP. Now, I've got several problems. First, it doesn't seem like the ASP.NET 2 support is in Mono. That's fine, I'll stay away from master pages (even though it hurts), and other new stuff. My biggest goal is to use VS 2005.

Then comes the real problems: There is no more “compile“ option in VS 2005 for web projects. Nope. Seems like the ASP/VBS and PHP whiners got their way and wanted things more like a scripting language. Some huge advancements were made (no more stupid IIS screwups). But I can no longer work as before. Which means I can't deploy as before. There's a precompilation system, but it's not what I want.

I can't even do it with runtime compilation either, because of this partial class and “compiles with“ nonsense. So it appears as if I'm screwed. Anyone have any suggestions?

Code
Thursday, December 30, 2004 12:30:26 AM UTC  #    Comments [2]  |  Trackback

# Thursday, December 09, 2004
xsp init.d service script

I'm putting XSP into production this week (yey, right on time for Mono 1.0.5). For those who don't know, XSP is a lightweight ASP.NET webserver for Mono (.NET). I have a few webservices that need to run on Linux, and XSP seemed like the easiest way to do it.

One of the things I ran across was how to start up XSP automatically. I'm not that familiar with Linux yet, so I wasn't sure how to go about it. The only site I found with anything on it is here, but it didn't work correctly (shutdown) for me. So after playing around with other scripts made for mod_mono (didn't work), I decided to figure out how init.d scripts work. After a bit of learning and lots of copy and paste from the other init.d files, I came up with the following. I'm pretty sure it's not that great, so please correct me.

Steps: 
  1 - Create /etc/init.d/xsp and paste the contents in (from below). Be sure the permissions are right (chmod 755 /etc/init.d/xsp).
  2 - Create /etc/xsp.conf and add the command-line args. Example:
            --port 8080 --root /path/to/site/
  3 - Run chkconfig --add chkconfig
  4 - service xsp start

/etc/init.d/xsp:
#!/bin/bash
#
# Startup script for xsp server
#
# chkconfig: 3 84 16
# description: xsp is a asp.net server
#
 
ARGS=`cat /etc/xsp.conf | grep -v \# `

. /etc/init.d/functions

start() {
 echo -n $"Starting xsp: "
 
 # Check PID/existence
 pid=""
 if [ -f /var/run/xsp.pid ] ; then
         read pid < /var/run/xsp.pid
         if [ -n "$pid" ]; then
   rm /var/run/xsp.pid
  else
   echo -n $"xsp is already running."
   failure
   echo
   return 1
         fi
 fi

 mono /usr/bin/xsp.exe --nonstop $ARGS > /dev/null &
 RETVAL=$?
 if [ $RETVAL != 0 ]; then
  failure
  echo
  return $RETVAL
  fi
 PID=$!
 echo $PID > /var/run/xsp.pid
 success
 echo
 return 0

stop() {
 echo -n $"Shutting down xsp: "

 if [ ! -f /var/run/xsp.pid ]; then
  echo -n $"xsp not running"
  failure
  echo
  return 1
 fi

 kill -15 `cat /var/run/xsp.pid`
 RETVAL=$?

 if [ $RETVAL = 0 ]; then
  rm /var/run/xsp.pid
  success
  echo
  return 0
 else
  failure
  echo
  return $RETVAL
 fi 

restart() {
 stop
 start


case "$1" in
  start)
   start
 ;;
  stop)
   stop
 ;;
  restart)
   restart
 ;;
  *)
 echo $"Usage: $0 {start|stop|restart}"
 exit 1
esac

exit $?

Code
Thursday, December 09, 2004 6:33:30 AM UTC  #    Comments [1]  |  Trackback

# Friday, December 03, 2004
Are C# and VS2005 that good?

Today I was in a chat with some members of the C# team. Usually, I can go on an on about how the product can be improved. But today, apart from some questions, I really couldn't think of anything great to ask. I use VS2005 all day for all my projects, and it is so much better than VS 7.

Things just rock, and as far as I know, all my major complaints have been fixed or will be fixed. This might not be true, and perhaps I throw a fit when Beta 2 drops :). But seeing that MS has done huge changes and 180s (i.e., C# E-n-C, data diagrams), I feel pretty confident that I'll be exceendingly pleased.

Code
Friday, December 03, 2004 12:46:04 AM UTC  #    Comments [0]  |  Trackback

# Sunday, November 28, 2004
Cracking Code 4: Replacing a strong name

In my last article, someone commented that editing an assembly would create a problem if the assembly is strong named. They are correct. If an assembly has a strong name and is tampered with, you'll get a System.IO.FileLoadException: Strong name validation failed for assembly <foo>.

Strong names are to identify an assembly. They are "strong" because the identification is provided with cryptographic means, rather than just the name of the file. The system is designed to ensure the assembly is what it claims to be, and public key cryptography proves it. Against malicious people, it can ensure someone can't drop an assembly signed with one of your trusted publisher's keys and get you to trust their assembly more than you should. It's NOT meant to be a way to stop people from editing and running assemblies on their own machine.

I was hoping there was a simple way to replace the strong name on an assembly, but I don't believe there is. Then again, there's a LOT of stuff that ships with .NET, so perhaps I just overlooked it. If so, let me know. At any rate, I wrote a tiny program to replace the strong name on an assembly. Let me explain it.

Somewhere in the assembly, a public key is provided (otherwise the runtime wouldn't know what to verify against!). Then, there is a hash of the assembly, and the hash is signed with the private key. When the assembly is modified, the hash will change, the signature will no longer match and the runtime will refuse to load the assembly. A cracker usually won't have access to the private key, and thus can't resign. However, one can simply replace the public key in the assembly with our own public key, and resign using our own private key. Problem solved.

A quick word to those who are thinking "Can't I just use SN -Vu to skip verification checking?". No, this doesn't work. Verification skipping only applies to partially (delay signed) assemblies, not to fully signed assemblies. If you somehow manage to get verification skipping working on fully signed assemblies, I'd love to know.

My program is a very simple tool with nothing amazing in it (except for a very slow search algorithm). All it does is take an assembly and a keyfile, replace the public key, and call SN -R <assembly> <keyfile> to resign. Here's how you'd use it:

1. Take Some.exe, a strongly named assembly. Modify it.
2. Note that attempting to load Some.exe will fail.
3. Create a new keyfile by running "SN -k mykey.snk". (SN is the StrongName utility that ships with the .NET Framework SDK).
4. Ensure you have the .NET Framework SDK (bin) in your path.
5. Change the public key and resign via "SNReplace Some.exe mykey.snk".

That's all. You can run "SN -Tp Some.exe" before and after to see that the public key has indeed changed. "SN -v Some.exe" will verify things are in order.

Download: SNReplace.exe (16 KB) Source: SNReplace.cs.txt (2.72 KB)
Code | Security
Sunday, November 28, 2004 7:20:21 AM UTC  #    Comments [12]  |  Trackback

# Friday, November 26, 2004
Cracking code 3: Cracking an obfuscated .NET assembly

Intro
It's been a while since I wrote anything that interesting, so I figured for Thanksgiving, I'd go ahead and do so. Merry Thanksgiving. The first article in this “series“ is here.

Cracking .NET programs can be just like cracking any other program. In this article, I'm going to use the same approach as I did last time. I threw together a quick little program called CrackMe2. CrackMe2 has a really cool feature called “Reverse Text”, however, it's only available to registered users. What's a poor boy to do?

Target
First, we try registering. Since we don't have a valid code (we don't even know what one looks like), we get an “Invalid serial.“ MessageBox. OK, so now we know that the program does something when we click a button, and if the serial is wrong, we get a MessageBox.


Darn, 123 didn't work.

Well, the first step in cracking is defining our target and it's location. Our target is the code that's deciding to say “Invalid serial.” instead of “You're registered!”. Where's the “bad code“ that needs to be fixed? Well, with a .NET assembly, our first information is gained by taking a look with IL DASM.


View of the obfuscated CrackMe2 assembly

Oh no! It's obfuscated (thanks to Ivan Medvedev's Mangler). Let's assume this is a big application and that we'll never find what we're looking for just by going through the IL. Just by glancing at the hierarchy, we don't know that much more than when we started: There's a form with code.

Seeing past the names
Now certainly, we can do static analysis and try to find out where the bad code is. One way would be by getting the strings (Ctrl+M in IL DASM, scroll to the bottom), and then grep the IL for ldstr , and work from there. In fact, that's a pretty quick and easy way to locate certain parts. However, lets pretend the strings are encrypted/dynamically generated, and that's not viable. So, let's start debugging.

[Michael@MAO C:\]$ cordbg CrackMe2.exe
Microsoft (R) Common Language Runtime Test Debugger Shell Version 1.1.4322.573
Copyright (C) Microsoft Corporation 1998-2002. All rights reserved.

(cordbg) run CrackMe2.exe
Process 4488/0x1188 created.
Warning: couldn't load symbols for c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll
[thread 0x1510] Thread created.
Warning: couldn't load symbols for C:\CrackMe2.exe
Warning: couldn't load symbols for c:\windows\assembly\gac\system.windows.forms\1.0.5000.0__b77a5c561934e089\system.windows.forms.dll
Warning: couldn't load symbols for c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll

[0004] mov         ecx,98543Ch
(cordbg)


cordbg is a command line debugger that ships with the .NET Framework SDK, and it's just loaded the CrackMe2.exe and related assemblies. Just like before, we're going to go ahead and set a breakpoint and find out where we are in the program, and work from there. So, let's breakpoint the MessageBox.Show function. We use IL-similar syntax to specify the function name: NameSpace.ClassName::Method.

(cordbg) b System.Windows.Forms.MessageBox::Show
Breakpoint #1 has bound to c:\windows\assembly\gac\system.windows.forms\1.0.5000.0__b77a5c561934e089\system.windows.forms.dll.
#1      c:\windows\assembly\gac\system.windows.forms\1.0.5000.0__b77a5c561934e089\system.windows.forms.dll!System.Windows.Forms.MessageBox::Show:0      Show+0x0(native) [active]
(cordbg)

Then, we tell cordbg to go until it breaks by typing go. The form comes up, and we enter a serial number: 123.

(cordbg) go
Warning: couldn't load symbols for c:\windows\assembly\gac\system.drawing\1.0.5000.0__b03f5f7f11d50a3a\system.drawing.dll
break at #1     c:\windows\assembly\gac\system.windows.forms\1.0.5000.0__b77a5c561934e089\system.windows.forms.dll!System.Windows.Forms.MessageBox::Show:0      Show+0x0(native) [active]
Source not available when in the prolog of a function(offset 0x0)

[0000] push        edi
(cordbg)

Bingo, we're stopped at a MessageBox. We want to know who called this function, since most likely, that will lead us to the critical code section we need to fix. So, we ask cordbg where are we?

(cordbg) where
Thread 0x1510 Current State:Normal
0)* system.windows.forms!System.Windows.Forms.MessageBox::Show +0000 [no source information available]
                owner=(0x00ac36b0)
                text=(0x00ad5854) "Invalid serial."
1)  CrackMe2!CrackMe2.Form1::AAAAAAAAAAAAAAAAAAAA +0070 [no source information available]
                AAAAAA=(0x00ac8400)
                A=(0x00aca86c)
2)  system.windows.forms!System.Windows.Forms.Control::OnClick +005e [no source information available]
                e=(0x00aca86c)

9)  system.windows.forms!ControlNativeWindow::OnMessage +0013 [no source information available]
                m=(0x0012ef04)
--- Managed transition ---

We see what's expected. Somewhere in Win32 code, a message was sent, and we see the OnMessage called and bubbling up all the way to the Control::OnClick, and then user code. We can look at all the arguments along the way, and that's useful for more complex scenarios (say, when a registration function calls another passing the serial number or validation code).

At any rate, we've got something to go on: The name of the function that calls the MessageBox: CrackMe2.Form1::AAAAAAAAAAAAAAAAAAAA (20 A's). We're done with cordbg (quit). Our next stop is to read the bad code.

Looking at the bad code
Using IL DASM (see above), I navigate to the CrackMe2.Form1::AAAAAAAAAAAAAAAAAAAA method. Inside is relatively straighforward code. First, there's a try/catch that has an Int32::Parse call in it. The result is stored in local 0. So we now know the code is numeric. Immediately after the catch handler, we have this snippet:
  IL_0022:  ldloc.0
  IL_0023:  ldc.i4.1
  IL_0024:  and
  IL_0025:  ldc.i4.1
  IL_0026:  bne.un.s   IL_0035
  IL_0028:  ldarg.0
  IL_0029:  ldstr      "Invalid serial."
  IL_002e:  call       valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(class [System.Windows.Forms]System.Windows.Forms.IWin32Window, string)

Load the local (the number entered), then load the number 1, and AND them. Then, load one, and if they are not equal, jump to IL_0035. If they are equal, execute the following instructions, which quite obviously say “Invalid serial.”. AND'ing a number with 1 and comparing to 1 is a check to see if the number is odd. So, at this point, we can write a keygenerator that produces... even numbers. A keygenerator is always preferred to a patch, however, generally speaking, finding the algorithm might be a bit harder. Then, there's always the possibility that the check actually does something hard to fake (i.e., uses RSA or talks to a hardware dongle/web service). So, let's go on and patch this code.

At IL_0035 (the target of the branch if the number is even), we have some code that does activation work and then proceeds to say “Thank you...”. Simple sample. Now, let's make the fix.

Simple Patching
With IL DASM and IL ASM, we have a really easy way to make patches. Simply run ildasm /out=CrackMe2.il CrackMe2.exe, and IL DASM will dump all the IL required for that assembly to a nicely formatted file. All we have to do is goto the bad method and fix up the IL. I think the most unintrusive fix would be to add “br IL_0035” to the top of the method. That would branch immediately to the good code, and the product would activate on any serial number entered.

However, some obfuscators try to stop IL DASM round tripping, and that might stop some posers in their tracks. The IL obfuscator I'm going to give away for free will do this, for example. (Actually, my free obfuscator would make this tutorial a bit harder because of how it handles names -- we'd have to actually get a token instead.)

Assuming we can't use IL DASM/ASM, what can we do? Use a hex editor.

Binary Patching
When we can't reassemble an entire program, we can patch certain opcodes instead. Tools like OllyDbg have a built-in assembler so we can easily make patches to the x86 code. For IL, I'm not aware of any such tool. Another issue with binary patching IL is that we have to ensure the resulting IL is fully correct and is able to be JIT'd to native code. If our patch ends up screwing with the IL in a way that makes it incorrect, we'll get a runtime exception from the execution engine. Let's try to create a binary patch that jumps from the beginning of the method right to the good code, at IL offset 0x0035.

First, in IL DASM, turn on “Show bytes”, under the View menu. This allows us to see the actual bytes that make up the opcodes. Now, lets look at the beginning of the critical function:

  // Method begins at RVA 0x2434
  // Code size       78 (0x4e)
  .maxstack  2
  .locals init (int32 V_0)
  .try
  {
    IL_0000:  /* 02   |                  */ ldarg.0
    IL_0001:  /* 7B   | (04)000002       */ ldfld      class [System.Windows.Forms]System.Windows.Forms.TextBox CrackMe2.Form1::AAAAAAAAAAAA
    IL_0006:  /* 6F   | (0A)000026       */ callvirt   instance string [System.Windows.Forms]System.Windows.Forms.Control::get_Text()
    IL_000b:  /* 28   | (0A)000027       */ call       int32 [mscorlib]System.Int32::Parse(string)
    IL_0010:  /* 0A   |                  */ stloc.0
    IL_0011:  /* DE   | 0F               */ leave.s    IL_0022
  }  // end .try

This code is protected in a try block. We could go and remove the try block, but that's modifying more code. Generally speaking, we should aim to patch as little code as possible to ensure we don't accidentally screw something up. So, we're going to deal with the try block and fix it from within. The ECMA specifications for .NET will come in handy here. Specifically, Partition III, CIL. This can be found in the .NET Framework SDK folder, under “Tool Developers Guide\docs”. It's also available from MSDN, here.

The first instinct is to say, hey, let's change IL_0000 to a br to IL_0035, and NOP out the remainder of the try block. However, that'd create illegal code, since you can't branch out from a try block, you must use the leave opcode instead. So, let's rewrite the method to simply leave to IL_0035. Here's the description of the leave opcode:

The leave instruction unconditionally transfers control to target. Target is represented as a signed offset (4 bytes for leave, 1 byte for leave.s) from the beginning of the instruction following the current instruction.

The formats (in hex) are DD <4 bytes> for leave and DE <1 byte> (as shown above), for leave.s. We'll use leave.s, just to be efficient :). Since the total size for leave.s is 2 bytes, we calculate the offset to 0x35 from 0x02 (since our leave instruction is at 0x00). Subtraction tells us we must have an offset of 0x33. Hence, our leave instruction in hex looks like: DE 33. Since that'd leave the IL in an incorrect state, we must nop out the rest of the try block. The hex for nop is 00.

Open the assembly in your favorite hex editor, and let's find the method. IL DASM gives us the RVA, but for now we'll just search for a specific byte sequence. The IL DASM Show bytes allows us to easily find our place. Do note that the way tokens are displayed ((04)000002, for example), is reverse from how they are stored. Depending on the size of the app, you might need to search on quite a large number of bytes, since IL sequences are most likely repeated. For this case, we're going to search on the last bit: “0A DE 0F”. No other matches found, so this is the one.

As when programming, in cracking we have many ways to solve a problem. Many of them can be considered “right”. We could make a simple one-byte patch by allowing any number as a valid serial. This has the merit of ensuring the local int is assigned, and well, being only a one-byte edit. The leave.s opcode is at offset 0x11, so add 2 to that amount and we get 0x13. 0x35 - 0x13 = 0x22. So by changing “0F” to “22”, we'd have our crack. However, let's stick to the original plan and jump right to the good bits from the beginning.

In the hex editor, we back up a bit until we find the 02 7B 02 00 00 04 part (ldarg.0, then load the textbox field). At the 02, we drop our leave.s IL_0035 payload, which is DE 33. Then, we nop out (00) everything until the end of the 0A DE 0F part. The resulting hex for the try block is thus: DE 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00. Save the file as CrackMe2.cracked.exe.

Satisfaction
Run the program. Type in anything for the serial. “Thank you for registering.” The second textbox activates. We've won access to the coveted “Reverse Text” function. Write up an .NFO, ensuring to remind people to purchase software to support the authors. Then kick back and play a game of KSpaceDuel.

Download the program itself (Right click and save as, since it's a .NET assembly and IEExec will try to run it otherwise): CrackMe2.exe (24 KB). Or, download the source: CrackMe2.cs.txt (4.81 KB).

Was this post interesting, helpful, stupid, or lame? Leave a comment and help me improve.

Code | IL | Security
Friday, November 26, 2004 5:22:20 AM UTC  #    Comments [13]  |  Trackback

# Saturday, November 20, 2004
Avalon's on XP

http://msdn.microsoft.com/Longhorn/understanding/pillars/avalon/avnov04ctp/default.aspx

Downloading from MSDN right now...

Code | Misc. Technology
Saturday, November 20, 2004 5:22:29 AM UTC  #    Comments [0]  |  Trackback

# Wednesday, November 10, 2004
Some open source people say sending patches by email is OK (bad security ahead...)

BroadVoice released a patch for Asterisk that fixes some issues with SIP registration. They hired people and made a commercial patch. Way to go.

Then, they decided to *email* it to customers. Yes. In 2004. A company emailing patches to customers. Apparently they didn't think this was dumb. No link to their web site, no secure download from their website, nothing. In fact, the email was signed “The BroadVoice Team”, which is the signature I remember seeing on a few virus emails.

So, I responded to the Asterisk-users mailing list about this patch, saying how it was utterly ridiculous to do this, as it teaches customers to not be secure and go blindly installing stuff. Here are some of the comments I got back (and they aren't sarcastic either!):

“the patch is pure c code. it took me 5 mins to read & understand it. is very simple (but useful).
Simply that patch (apart from adding some logs, comments and little code formatting) simply caches auth data AND let * manage 403 responses from the server, and this last one perhaps is the issue that was overloading BV .
so, just read it (or let someone do for it) and understand that's not a problem :)“

“I don't see a security issue with his method. If you (a) read the entire patch and (b) comprehend fully everything that it does, then there's nothing to worry about. Fear comes from the unknown, and if you know everything in the patch, there's nothing to fear. “

“To claim that someone opens a security hole by accepting a verified patch via email, is the same as claiming that you never have a security hole just because you download from "trusted" sites. Webservers can be hacked, you know. And not every buffer-overflow will lead to a security issue -- many just crash the system. “

So, I think this goes some way towards showing that all is not well as far as security mentality in open-source land. I pointed out to them that “even Microsoft does it right” :). Didn't seem to make me popular.

Thinking that you can just read the code and be set is equivalent to saying there should never be any security holes in any code because people will just read and know. Add to the fact that what you're combating is a possible *malicious* security hole, not just an accident, and I think most devs would pass things right over.

Code | Security
Wednesday, November 10, 2004 11:57:11 PM UTC  #    Comments [0]  |  Trackback

# Thursday, November 04, 2004
Help preserve the BCL: Vote against lame naming for parameterized types!

Do you use .NET? If so, the BCL design guidelines are in dire need of your support! According to Krzysztof Cwalina, the BCL is now going to start using long names for parameterized types. Forget Dictionary<K, V> and clean, concise, code. Nope. Apparently some people couldn't figure out what the heck a List<T> was, and really needed List<TypeOfTheItemInTheDamnList>.

And the BCL team, in an effort to save some work in tools I imagine, decided not to add documentation/descriptions thru metadata (i.e., add an attribute to say <T> in List means *the type of the item in the list*).

Here's the details:
http://blogs.msdn.com/kcwalina/archive/2004/11/03/251722.aspx

So, do you want to write/maintain/use/and in general just look at horribly verbose code? If not, get you say in here: http://lab.msdn.microsoft.com/ProductFeedback/viewFeedback.aspx?feedbackid=3cee09d8-3b82-4c5f-83a4-be52ba9b9e98

Vote against this ugly excersice in verbosity today!

Code
Thursday, November 04, 2004 3:34:45 PM UTC  #    Comments [0]  |  Trackback

GCC Visual Studio Integration

As I mentioned a few posts ago, I have a makefile project in MSVC++ 8 (2005) setup. Part of my solution builds with GCC on Linux, part on Windows with CSC. By using plink (command-line version of Putty), I'm able to ssh over to my Linux machine and build. The errors show up in the error list in VS2005. Except for one slight problem: Visual Studio does not read the error line information correctly, resulting in an error if I try to click and goto that line.

GCC outputs errors so: my.c:123: error: you suck. VS expects them as my.c(1) : error you suck. So, I decided to write a filter for the output. But, since it was around 2 AM, I decided to check and see if GCC supports out in the style that VS expects. As far as I can tell, it doesn't. But, I found a link to a page that has a program that accomplishes just this (GNU2MSDEV): http://www.xs4all.nl/~borkhuis/vxworks/vxw_pt1.html#1.13

Yey, I'm done, right? Well, not quite. GNU2MSDEV reads from stdin. GCC puts the errors on stderr. So, the parser only gets part of the output, missing the critical parts. Ouch. Now, since it's past my bedtime, I spent a while trying to figure out what was going on, thinking perhaps the program was broken. Got the source, uncommented some debugging stuff, and finally, I realised that I'd need to get stderr send over stdin, as the easiest course of action.

I'm not sure how you redirect stderr in the NT command shell. One page I read said this was impossible from the command line, and you need to use a different shell (they suggested running Linux), or write a program to do it for you. I decided to just add this to my linux commands. Now, my VS build command line looks like this:

plink 192.168.0.123 -ssh -l myUser -pw myPassword "cd /usr/src/something;make clean; make install 2>&1;" | gnu2msdev

Presto. Now it works just perfectly, and I can double click the error list and go right to the file/line where the error is. Of course, now it's 3AM for some reason, and instead of writing some code to record phone calls, I'm instead going to go to sleep.

ast_mono | Code
Thursday, November 04, 2004 9:02:41 AM UTC  #    Comments [0]  |  Trackback

# Saturday, October 30, 2004
SimpleChat

I wrote this quite some time ago, with the Whidbey Beta, I believe. I was gonna make it nice, but never got around to it, but somehow it came up today and I remembered it. So, I decided to post the code and the link.

Basically, it's an HTML Chat client using a ASP.NET in-process server. It's very, very lightweight -- it wouldn't work for tons of users (it uses a ReaderWriter lock!). I use web services to poll for new messages (or translate received messages, although that seems broken right now).

Now, since it relies on the soap client, it doesn't work on all browers. A easy way around this is to replace the web services with IFRAMEs. Also, if you wanted to scale this solution up (say, you're building a chat client for MSDN :)), I believe you should leave the HTTP connection that receives messages open for say, 60 seconds (or more). This reduces the hit of polling drastically, and works quite well (I tested it in a few browsers). The server only has to authenticate the client ONCE, and then it can just pump new content down the pipe as it seems fit. At any rate, it's a simple chat system that could give someone a head start if they lack a head.

Anyways, get the code (ASP.NET 2 required): SimpleChat.zip (25.33 KB) It's running here: http://www.atrevido.net/SimpleChat/

If there's enough interest, and if a 19th level wizard visits me and gives me a Wand of Time Stop, perhaps I'll write a production level version of this (that DOESN'T use SOAP).

Code
Saturday, October 30, 2004 5:29:29 AM UTC  #    Comments [2]  |  Trackback

# Wednesday, October 27, 2004
Massive XML abuse

OK, I've had it. Ever since XML came out, certain people have been misusing it all over the place for no reason at all. *XML IS JUST A FORMAT.* It's not magic. It's not cool. Use if it makes sense. However, it is actually a REAL format; adding < and > to a document doesn't make it XML. LinkPoint needs to learn this.

LinkPoint (owned by First Data) is a rather large company to process credit cards. You would think they'd have people who actually have some clue as to what they are doing when it comes to their programmatic interface eh? Check this code sample out:

protected string ParseTag(string tag, string rsp)
{
  StringBuilder sb = new StringBuilder(256);
  sb.AppendFormat(
"<{0}>",tag);
  int len = sb.Length;
  int idxSt=-1,idxEnd=-1;
  if( -1 == (idxSt = rsp.IndexOf(sb.ToString())))
  {
return ""; }
  idxSt += len;
  sb.Remove(0,len);
  sb.AppendFormat(
"",tag);
  if( -1 == (idxEnd = rsp.IndexOf(sb.ToString(),idxSt)))
  {
return ""; }
  return rsp.Substring(idxSt,idxEnd-idxSt);
}

I'm not making this up. At first I started laughing. And continued. It's one way of processing XML, heh. I also love the use of a StringBuilder *for no reason*. They didn't even have the decency to think about Regular Expressions. (And what's up with that crazy formatting on the ifs??) Sigh.

The whole point of XML is to provide a standard way to process data on whatever platform you wish, eliminating the need for stupid code like that above. With XPath, all that junk comes down to about 3 lines of nice, neat code. So I continued to chuckle as I wrote my nice, elegant code.

Until it came to runtime. Apparently, some folks don't know that XML has *ONE ROOT ELEMENT*. Throwing a bunch of tags together doesn't make it a valid document. And invalid documents mean... yep, you guessed it: Errors from your XML parser. And without a working XML parser, you're back to manually handling it. So why even bother with “XML“ if you're not going to do it correctly? A simple name=value would work just fine...

BTW, this is the second vendor this week I've seen using invalid XML documents.

Quick update: The easiest solution in this case is to just do: theirXml = “<root>” + theirXml + “</root>”; // works like a charm.
Code | Humour
Wednesday, October 27, 2004 5:22:11 AM UTC  #    Comments [2]  |  Trackback

# Tuesday, October 26, 2004
Cross-platform development bliss

Visual Studio 2005 is just beautiful. I'm currently writing ast_mono, which has 2 C projects (that build on Linux), and 1 C# project (that builds with Whidbey). Everything gets deployed on Linux. However, I have my entire dev process available right from VS. The C projects are Makefile projects, and allow a command-line to be specified. Enter Plink, and I'm set. Plink is an Win32 command line SSH client. I just pass in the commands (cd /usr/src/ast_mono/runtime; make clean; make install;) and away it goes. In fact, VS detects when there's an error and shows the errors (from GCC) in the VS task pane.

Only two things I want: file/line info from GCC into VS. GCC outputs errors differently than VS is expecting and thus I can't go right to the file/line of the error. And debugging. I've got no idea how to do cross-platform debugging, let alone with VS integration. I highly doubt this is even possible.

Meanwhile, I'm enjoying my single-step build/deploy process very much.

Code
Tuesday, October 26, 2004 3:55:49 AM UTC  #    Comments [1]  |  Trackback

# Friday, October 15, 2004
MySQL is really secure... or bad.

I chose MySQL to use as my database, since I was writing on Linux, in C, and it just seemed like the easiest path. Can someone please say “you were so wrong”? MySQL has to the worst DB engine out there. It doesn't (ok, just added) even have support for SUBQUERIES! Barely has support for multiple charsets. And... binary(20) is NOT a binary field 20 bytes long. It's a char(20). You can't execute multiple commands in a single query. It's embarrassing to open source really. I don't know who could argue that MySQL is competition for SQL Server or Oracle and keep a straight face. Check this list out: http://sql-info.de/mysql/gotchas.html (I really love the part about date handling.)

On the other hand, it's very secure. www.kalea.com.gt <-- No checking of user input whatsoever. (BTW, my little article about Kalea made me a top search result for Kalea Guatemala -- while their site doesn't even show up.)  They take your querystring, concat it to their query, and off it goes. But guess what? Good luck trying to hack it. MySQL is so poor, doing SQL injection and achieving anything fun is nearly impossible. So much for adding prices to their site :). Oh wait, you can do a DoS by using the BENCHMARK expression and then encode/Sha1/etc.

So what am I going to do? Switch to SQL Server as soon as I get a release candidate done. I'm going to load Mono into my C app, and then transition into managed code and use some nice TDS libraries and have a good day with a database that actually works well. Had I done that to begin with, I'd be a few hours ahead of schedule instead of behind schedule...

Code | Humour | Misc. Technology | Security
Friday, October 15, 2004 4:18:53 AM UTC  #    Comments [2]  |  Trackback

Visual Web Developer is so nice

I've been working a bit on the web-side of my VoIP application, obviously in ASP.NET (hey, just because Asterisk runs on Linux doesn't mean I'm completely converting!). I'm finally getting to use VS2005 full-time. The Web.NET has done an awesome job of fixing up the editor. Pretty much everything that really bothered me and sucked about editing pages in VS has been fixed. Selecting elements is so easy. Navigating the HTML is simple (and doesn't loose formatting!). The built-in webserver (and no IIS requirement!) rocks for debugging. I'm just quite surprised at how good everything is. I saw all the cool features a year ago, so I knew it was supposed to be nice, but just using it drives the point home.

Code | Misc. Technology
Friday, October 15, 2004 3:03:58 AM UTC  #    Comments [0]  |  Trackback

# Tuesday, October 12, 2004
Why do we lose the ASP.NET 1.x compilation model in ASP.NET 2?

Writing the Turing ASIX brought me back to a really sore spot in ASP.NET 2: Lack of a good compilation model. In ASP.NET 1.x, you could compile all your code (*.cs) into an assembly, and you were set. Here's why I hate the new “code-beside” and “dynamic compilation” models as they are implemented in VS 2005:

--Deployment/content editing nightmare
Before, I could drop the DLL on the server, *have no source code* on the server, and allow someone else (i.e., my client), to edit the ASPX/ASCX content. In a few cases where I wanted to expose code to him, I could make a virtual method in the base class, and allow him to override it via C# code in a <SCRIPT> block. With the new VS 2005 model, my scenario is blown away and destroyed.

--Access to code means huge, ugly, hackish workarounds
Before, if I made a page/class/control/whatever in any part of my app, I could reference this from any other part. For instance, my Turing image generator. I have two statics on it that any page can call. I want that code to be in Turing.asix.cs (or in the .asix). But I can't! I am required to put it in the /Code directory for no reason at all. Maybe this was done because of the “web programmers” who think HTML is a programming language. Maybe it was to act as a ward to scare of people who are afraid of code. I can't figure it out. All I know is that it pisses me off. This problem is more serious than just my annoyance about moving one file.

Suppose I'm working on a larger site, and to keep things in line, I organize the site into various folders. Now say I'm in something like /TheSite/SomeArea/HierarchialViews. I have a few ASCX controls there, but they all share some common code (some enums, and some pure code classes that help with the sorting or organization for the views (say, something that generates a generic tree to be consumed). Where do I put the code? Well, with this new model, I've got to put it in /TheSite/CODE/SomeArea/HierarchialViews. In other words, I'm required to duplicate my entire site organization inside the Code directory, just because... um, well, I haven't found a decent reason yet.

The ASP.NET/VWD/whatever team should NOT be making these kinds of decisions for developers. Visual Studio should be a tool that we can use to write apps how we want to write them. This model worked fine for 1.x. Why has it become so hideous that they needed to REMOVE it from 2.0? With all the huge advances ASP.NET 2 and VS2005 take, why must they take this big jump backwards? Couldn't they just leave it in and say “You can do this, but we really recommend using a Code folder so you don't lose track of your .cs files.”??

The only *partial* reason for this behaviour that I can tell is the move to partial classes. Since it's a partial class, it needs the rest of the code generated from the ASP.NET runtime to compile. *I* was quite happy with the inheritance model used before. While partial classes are nice, *I* don't see any personal benefit in using them if it's going to introduce problems like this. At any rate, that still doesn't explain why I can't have a Foo.cs inside any directory (not just the /Code directory) and be able to use it.

Code | Misc. Technology
Tuesday, October 12, 2004 1:38:46 PM UTC  #    Comments [2]  |  Trackback

Turing image generator for ASP.NET

Today I was coding a site, and I realised I needed an easy way to avoid automatic signups. So, I did what everyone else does: added a Turing image. Since I was coding in ASP.NET 2.0, I thought it'd be nice to try out the new ASIX image generator type page.

It's pretty nifty. Nothing that you couldn't do with an ASHX in about 5 minutes, but still pretty cool. What I like is that the template starts you off right where you can start coding against the Graphics object. This will definately make entry much easier for people who aren't as comfortable with these classes. In the past I've normally been against things like this (i.e., a whole set of code just to save some minor work for one specific case), but I think this was a pretty good thing to add.

Download the code here: Turing.cs.txt. This is for ASP.NET 2.0 -- just create a new ASIX and point it at the Turing class. But, it should be pretty simple to hook it up into ASP.NET 1.1. If anyone seems interested, or somehow I get more free time, I'll post the required ASHX handler. Anyways, from ASP.NET 2, all you need in your main page is this code:

string nonce = Turing.GenerateNewNonce();
ViewState[
"turingNonce"] = nonce;
this.turingImage.ImageUrl = "~/Turing.asix?nonce=" + Server.UrlEncode(nonce);

Then, to verify (say, in a validator) just do:

Turing.Verify((string)ViewState["nonce"], myTextBox.Text);


Just be sure to set EnableViewStateMac to true (otherwise someone can set the “nonce” to something known and render the system ineffective).

Note, I originally wanted to use a nonce system, but instead ended up using a simple encryption. So, it's possible to record the output of an image once (via the querystring data) and store it for later use (until the ASP.NET app restarts). I also use the Random class instead of the RNGCryptoServiceProvider.

As well, since I only use 5 capital roman letters, some basic AI should be able to defeat the algorithm. Add more letters, lines, change colours, etc. to make it stronger. There's some commented code that adds a dark gradient background. Playing around with this could make it harder for AI, at the cost of making it hard for your users.

Edit:
I realised that the way things were, an attacker could request the image multiple times, and get a different output (since the noise is random). This could be used to run a couple of extra passes on the same code, and increase the accuracy of AI against it. Or an attacker could request the code enough times to get an image that isn't that distorted and attack that.

The fix is to seed the random generator with something we can calculate from the nonce (to ensure it's the same image each time), and something the attacker cannot know (so he can't just run our code and see where the lines are). I do this by encrypting the nonce, and taking the first 4 bytes as a seed for the Random class. At 5:33am, this seems solid enough to ensure the numbers are not known to the attacker.

Here's the updated code: Turing2.cs.txt

I think I'm going to A) Add some image transformations to 'warp' the text somewhat, and B) really create a nonce system, instead of just relying on a simple encryption.
Code | Security
Tuesday, October 12, 2004 1:19:43 AM UTC  #    Comments [0]  |  Trackback

# Saturday, September 18, 2004
Cute VC++ editor trick

One thing that really annoys me about the VC++ editor is that when you collapse something, say a method, it eats up all the lines around it, until the next non-whitespace line. So when you look at your file collapsed, you see all declarations all together, and at least I have a problem reading that easily.

However, here's a simple trick to get around it. Throw a tab in a line. The collapsed region stops consuming when it finds a tab, and thus you can get the appearance of separation with everything collapsed. Nice.

Code
Saturday, September 18, 2004 5:40:18 PM UTC  #    Comments [1]  |  Trackback

# Thursday, July 22, 2004
Birthday attack in C#
How strong is a 128-bit hash? If you are looking to avoid collisions, the answer is not 2^^127, but 2^^64. Why? Due to the birthday paradox. Wikipedia says: “Specifically, if a function yields any of n different outputs with equal probability and n is sufficiently large, then after evaluating the function for about √n different arguments we expect to have found a pair of arguments x1 and x2 with f(x1) = f(x2).” The name “birthday“ comes into play because this holds true in a group of 23 or more people, chances are about 50% that two of them will share a birthday. The actual formula is Sqrt(n) * 1.2.

For a hash function, where strength is measured in powers of two, it's simple to calculate. For the exponent (128), just divide by two. So, we have 1.2(2^^(128/2)), but for most purposes, we leave off the 1.2 and just say 2^^64.

This means that if you're trying to find a collision, say, when attacking a digital signature system, the hash strength is considerably weaker than it sounds.

This sample program (Birthday.cs.txt (4.49 KB)) demonstrates this in C#, against a 32-bit hash (the first four bytes of MD5). Type in two messages, and it will find a collision by overwriting the first for chars of the message with random data. The code is not as clean, and it's definately not optimized for performance. That said, the 32-bit hash is successfully attacked in about 2.3 seconds on my machine (3GHz P4).

How effective is this attack? Very. It's extremely easy to modify most document formats these days. Pretty much every document has some place where you can insert or replace “hidden data” -- things a user or system do not see or process. For instance, in HTML, you could simply add the collision data inside an HTML comment. In a plain text file, you could modify spacing, tabs, and perhaps some other punctuation. It wouldn't change the meaning or validity of the document, but it allows you to generate enough combinations to find a collision.

After finding two colliding documents, you send the “original” to the victim, who then signs it. Then you take the good signature and substitute your “bad” document -- presto, a fake signature.

How can you prevent this? One way which might not always work is to modify a document before signing it. The real way is to use a hash long enough to provide the level of security you need. If you want “128-bit” security, in the sense that someone needs 2^^127 or so processing power to break it, then use SHA256. If for some reason you only have shorter algorithms at your disposal, a possibility is running the hash function again, with modifications to the document (for instance, switch every two bytes). This would give you a longer output.
Code | Security
Thursday, July 22, 2004 9:29:51 PM UTC  #    Comments [1]  |  Trackback

# Sunday, July 11, 2004
D&D Items do exist: I just read a "Tome of Stupidity -1"

For those of you who played D&D (here's a funny video to see what it's like), you might recall that there were magical tomes that could increase or decrease your abilities, just by reading them. Of course that's impossible in real life since we'd need powerful magic... right? Well, as I have unfortunately learned, no. A while ago, I had to maintain someone else's app. I believe in the process of reading this app's code, I have lost a few IQ points. Let's take a look, shall we?*

All the code in this app uses horrible variable names. In a 250 line block of code (a single method -- the writers must have thought there to be huge drawbacks to using methods), the first line starts off by declaring the variables. A sample looks this:

dim objconn,objrs,strDatabase,mysql,mysql1,sqlstring,rstemp,dbConn1,objrs1,query

This is a truncated line. They actually declare about double that much. Regardless on how you feel about declaring everything at the top of a file, this is bad. They don't use these variables at the same time. For instance, they'll open objrs, do something, and then close it, then open rstemp and repeat. There aren't actually two objects in use at once. They just declared extra variables for fun. Or maybe they thought they had to give the variables a rest. I don't know. And I don't think they did either. Of course, it's better than using no variable names at all.

They have a process to read values from a comma-delimited file. So, one line at a time, they use VB's split function, storing the result in a variable named “split“. So far so good. Then they proceed to use constants for the next 100 lines to refer to different fields, giving way to wonderful code as so:

if split(6) = “true“ then
  objrs1.open “SELECT * FROM Table WHERE Field1 = “ & split(2) & “ Field2 = '“ & split(9) & “'“
  split(4) = objrs1(“SomeField“)

At a few places in the app, a field is selected from the DB for absolutely no reason:

someId = Request.QueryString(“someId“)
rs.Open “SELECT SomeId FROM Orders WHERE SomeId = “ & someId, objConn1

someId = rs(“SomeId“)

That's right. They select a single field (an int), constraining it to the current value of their var, and then set the var to the same value. Maybe there's something special in SQL that I'm not aware of. To their credit, there's actually a check for rs.Eof first (omitted for clarity of stupidity).

Here's a brilliant idea for performance: Don't use SQL's COUNT. In quite a few places, they'll execute a semi-complex query that returns, on average, 10,000 rows. But why bother with SELECT COUNT, when we have SELECT *?

The entire app is built like this. The people who wrote this should have their text editors confiscated.

* Some variable names have been renamed to protect the innoce-- mentally challenged.

Code | Humour | Misc. Technology | Personal
Sunday, July 11, 2004 8:14:29 PM UTC  #    Comments [1]  |  Trackback

# Monday, June 14, 2004
The Object Test Bench
Just another way Visual Studio 2005 “Whidbey” is going to help out: The Object Test Bench. This nifty tool (found in View -> Other Windows -> Object Test Bench) allows you to create objects and play around with them at design time. For instance, suppose I want to find out what kind of data the System.IO.FileInfo class presents, and how it presents it (say, do directories have a trailing slash?). I simply open the window, and type in my expression:
System.IO.FileInfo someFile = new System.IO.FileInfo(”C:\\x.cs”);

Presto! I can now explore this new object. Supposedly, there will be other ways to get objects into the bench, say the Class View or Designer, but it didn't seem to work in the build I'm using (which is a bit more current than the May CTP). Trying to create some of my own classes or collection classes seemed to have problems too, but I'm getting a new build in a few days, so we'll see if it's fixed then.
Now, suppose we want to learn more about the functionality of this object. Right click it, and away we go:

I created a new object, a string, to store the filename, and now I'll invoke the CopyTo(string, bool) method. I can use new literals, or existing objects:

Any (?I think?) expression is valid, so I could do: filename = Path.GetTempFileName(); and use the result in a variable. Even better, I don't even need to declare the variable. Any method called pops up a dialog stating what was returned, and prompts to add it to the bench. Here, I've called “ToUpperInvariant()“ on an existing string:


Just another gem that's definately going to help as I explore .NET 2.0.
Code | Misc. Technology
Monday, June 14, 2004 5:05:56 AM UTC  #    Comments [3]  |  Trackback

My #1 Whidbey Feature

I've been very, very busy lately, and my wrists have been hurting (spend over $100 getting a “keyboard manager”). However, among the things I've been doing, I've been involved in a usability study with the Visual Studio team. Basically, we meet over live meeting with my desktop shared, so they can watch how I use Visual Studio. This helps figure out if I'm using the new features correctly, or if the design could be clearer. I like it cause I can make very direct feedback and hopefully improve the product for others! Speaking of feedback, the new default for strings in Whidbey is maroon (at least on the build I just installed) -- I made this suggestion to someone who works in that area about two months ago -- so they ARE listening! :)

Whidbey has a host of new features. So many aspects have been fixed up so when you use it, you just have to say “Oh sweet, that's nice!”. There's been a lot of coverage of the “big” new features, like generics and in C#, refactorings, and that's well deserved. However, there's been a ton of work on the day-to-day stuff as well. The #1 top thing I miss when using Everett is auto-Intellisense, for lack of a better name. In VS2005, Intellisense activates on a single keystroke (most of the time), and the list is complete: even keywords are listed. I think preprocessor directives are the only things not available (I've put in a wish :)). It might not seem like a big deal, but it is definately the top thing I notice line-by-line when working in VS2003. CodeRush (www.devexpress.com) helps a bit, but still doesn't come close to how great Intellisense is in VS2005.

Code | Misc. Technology
Monday, June 14, 2004 1:04:28 AM UTC  #    Comments [0]  |  Trackback

Find ALL References
In Visual Studio 7.x, finding all references to a symbol was really annoying. You could click “go to reference”, and then you had to use Ctrl+1 and Ctrl+2 to move around. Not nice. Visual Studio 2005 changes this. Now, you can find references and have all the results show up, along with the code where they are used, and the file and line information.
Code | Misc. Technology
Monday, June 14, 2004 12:51:30 AM UTC  #    Comments [0]  |  Trackback

Tracepoints in Visual Studio 2005
If you're like me, you find yourself throwing in temporary lines of code to trace your code execution. Console.WriteLine, or perhaps the Trace/Debug classes. However, how many times have you stopped a debugging session to add a very temporary trace line in? Or perhaps you just get tired of adding all those calls and messing up your code?

In Visual Studio 2005, you can now have a breakpoint output a message (or even run a macro) when hit. First, create a breakpoint, and select “When hit“ from its context menu:

You'll get the following dialog with a lot of cool options:

Notice all the different keywords allowed, as well as variable evaluation. When you run the app, the tracepoint output is sent to the output pane. Variables in curly braces are evaluated, and even cooler, you can hover over variable names to get details:

Very, very nice.
Code
Monday, June 14, 2004 12:41:07 AM UTC  #    Comments [0]  |  Trackback

Console apps in Visual Studio 2005

Ever write a short main method to test something out? You try something, and write the output to the console. Or perhaps you have extra debugging info going to the console while your program runs. I've been annoyed a lot when I run my console app, and VS opens a new console window for 1 second, and then the program closes, and I can't see the result. I also hate having to switch back and forth between VS and the console app while running.

Visual Studio 2005 takes care of this, with the new “Console” debugging window. As far as I can tell, the console streams are mapped to this pane inside Visual Studio, so you can dock it, have it as a document window, or however you want. Afrer your program runs and exits, the data will still be there.

Do note that not all the new Whidbey console features are supported, since it's not a “true” console window (you can't use the Win32 console functions on it). But for basic console work, it does the trick.

Code
Monday, June 14, 2004 12:12:27 AM UTC  #    Comments [0]  |  Trackback

# Thursday, March 25, 2004
Storing passwords and hashing

I see a lot of articles on hashing passwords, however many of them skip over an important part of setting up this kind of system: iterations. But first, a quick primer on hashing in general.

Hashing is a cryptographic function that takes variable-length input, and creates a constant-length output. The output is commonly called a hash, or a digest. The most common algorithms are MD5 and SHA1. MD5 creates a 128-bit hash, and SHA1 creates a 160-bit hash. There are also SHA256, SHA384, and SHA512, although 384 is pointless, since it's just the SHA512 with some data discarded. It's computationally unfeasible to find two plaintexts that have the same hash output. Hash functions are used in some common scenarios:

1: Creating a digest of a message to ensure the message was not modified (intentionally or unintentionally). Sometimes this is referred to as a checksum. eDonkey is an example that uses MD4 hashes to identify files (and as files are downloaded, they can be checked to be good by computing the hash).

2: Digital signatures, where the hash is encrypted with a the private key of an asymmetric algorithm (like RSA). This can then be decrypted by anyone with the public key, and checked against the computed digest to ensure that something with the private key did “sign” the message, and that the message contents have not changed.

3: Securely storing passwords. Since a hash is a one-way function, it's impossible to *decrypt* the hash and retain the password. Well designed systems will not store plaintext passwords (otherwise someone who reads the database could get your password and do nasty things as “you”). If you ever use a site that sends your current password back to you if you forgot it, then they most likely have a badly designed system (and you should question the rest of their security).

We're going to focus on the password issue. Attackers can figure out a password by computing the hash themselves for a suspected password, then comparing to the actual value. So, while the hash value might be 160-bits, it certainly doesn't take 2^160 steps to find the right password, since many users use weak passwords.

When hashing a password, it's common to add some random bytes to the password that are unique for the user. This is called a salt, and it ensures that for each user has a different hash, even if the password is the same -- since hash(”password”) will always return the same, but hash(”password” + “randomData”) is going to be different. This means that an attacker must compute a separate hash for each possible password, *per user*. This helps stop an attacker from trying to attack all the users at once, since each additional user requires a complete attack (since there's a salt).

However, lets say that the attacker is going after a specific user. If the user picked an easy password, say 6 alphanumeric chars, the password's strength is ~36 bits (35.7 to be more precise, 5.95 bits per char). This is assuming completely random characters are used, which is hardly ever the case. That's not that much work for a attacker, and we're considering 64-bit security (128-bit keys) to be the “required security” level.

However, suppose instead of calculating one hash per password+salt, we take the hash, and re-hash it n number of times, where n is something between 2^14 and 2^18? Well, now the number of steps required per password goes up that much. The 36-bit password now has an effective strength against brute forcing of 2^50 to 2^54. Essentially, by adding 2^18 steps to the hashing, we've added the equivalent of 3 *random* characters to their password.

So, do you need to iterate? Find out your minimum security level (48-bit? 56-bit?). Figure out how many iterations you can perform on your hardware before performance is unacceptable (probably between 2^14 and 2^18). Subtract that from your required level, and you have the minimum password entropy level.

For instance, let's say that I want to have 64-bit security from my passwords. My hardware can do 2^16 iterations without hurting logon times, thus I need 64-16= 48 bits of entropy in each password. This can be accomplished by requiring passphrases consisting of four common words (say a dictionary of 4000). (12 bits per word = 48 bits in the password + 16 for iterations, and I'm set).

Hashing is even more important when you don't have control of how good the passwords are. For instance, you're saving customer's credit card data, and the key is based off their password (so that they MUST login for your system to access that data). In these cases, requiring a complex password might not work for various reasons such as customer pushback, or risk of customer choosing something like your site name or their name as a password. It's important to determine the level of password complexity that will “push users over the edge” - the point when they stop using something remotely random, and start using things like their last name, their SSN, etc. When that point is reached, the entropy of their password is uselessly low.

Now, assuming a semi-casual attacker with a strength of 40 bits. He's got the power to do 2^40 steps of computational work. If your users use 24-bit passwords, their hashes can be broken by this attacker easily. But, with 2^18 iterations, those weak 24-bit passwords now require 2^42 steps, and the hash is saved.

So, there is really no good reason not to do multiple iterations. Even 1024 will provide some strength (equivalent to 2-3 extra characters in the password). In fact, the .NET framework already has a class that does all of this (hashing with whatever algorith, salting, and iterations) for us: System.Security.Cryptography.PasswordDeriveBytes. Use it!

Code | Security
Thursday, March 25, 2004 4:06:06 AM UTC  #    Comments [1]  |  Trackback

# Monday, March 08, 2004
Nothing is secure

One thing to keep in mind is that nothing that I know of in this world is secure.  I'm not just talking about software.  Dictionary.com defines secure as “free from danger or attack”.  Can you think of ANYTHING that meets that definition?  Leave a comment and win a prize if you can.

Security is about probabilities.  “How secure is X?” is often asked.  Does that mean if we use ultra-high encryption that it's impossible for someone to break through?  If I chose a 256-bit key right now and encrypted my data with it, is my data secure?  Remember, it's *possible* that someone could guess a 256-bit key in one shot.  The probability of that is usually extremely low, although if I picked a key of all zeros a system might try that to start off and thus win in one turn.

So, when choosing your defenses and making your tradeoffs, always consider the probabilities of a certain attack occuring.  Wasting time “bulking up” defenses in one area while ignoring weaker areas is like optimizing code that isn't slowing your system down: pointless and a waste of time.  You will never have something that's “secure”.

Code | Security
Monday, March 08, 2004 6:48:27 PM UTC  #    Comments [0]  |  Trackback

Cracking code - Part 2: Other simple attacks

In part 1, we attacked the code by stopping it at a known point, the “invalid code“ message box.  From there, we were able to trace up to where a decision was made as to the validity of our serial/code, and change that logic around.

Going through someone else's compiled x86 code can be somewhat like going through your server's logs to find some specific information.  Most people don't start with log entry 0 and read each one.  We filter the logs, look for error entries, etc.  Depending on what we do know about the events we are looking for, we can find the related entries in different ways.  The same applies when going through code.  Here are two other simple things that we could do to SimpleCode.exe to break it:

Strings
We could search for all strings, and then look for “good“ strings, something we'd expect to see when our code is valid.  OllyDbg can dump these strings and search them, and then take us to the places where they are used.  From there, we can track up and see where/why that code wasn't called.

Our input
Every program needs to take our input, then somehow validate it.  If we enter some data that's easily recognizable (like “AAAA”), we can set a breakpoint on memory access to that location.  From there we can figure out what's being done to our input, which is useful for reverse engineering -- creating a “keygen”.  Having a keygen is much more valuable, because we don't need to make binary patches and modify the executable.  Between different versions of the software, the key validation will probably remain the same.  If we know how to generate our own keys, we have a “one size fits all” attack.

Code | Security
Monday, March 08, 2004 6:01:59 PM UTC  #    Comments [0]  |  Trackback

Vault - great source control

I just found a great product, Vault.  Vault is a source control system written entirely in .NET.  The main advantage of it is that it's easy-to-use, and also important, easy-to-setup.  I tried using VSS, really, I did!  But wow, it's really poor.  Getting things working correctly even on my local machine was a pain, not to mention on the LAN or over the Internet.  Vault took all of 5 minutes to install on an Internet-based development server, and works quite well -- as well as anything that has to interface with VS.NET can.  As well, if you're just looking for a system for your local machine, it's free for one user.  Very cool!

Code | Misc. Technology
Monday, March 08, 2004 5:36:33 PM UTC  #    Comments [0]  |  Trackback

# Tuesday, March 02, 2004
Cracking code - Part 1

Update 2004-03-07: Added screenshots.

Read the intro to find out why I'm writing this.

Alright, before we get into attacking .NET, let's see how it's done against common Win32 programs in x86.  First, you'll need a good disassembler/debugger.  I recommend OllyDbg.  It's very easy to use, and does a good analysis of the code, which helps us out quite a bit.  SoftICE is another alternative, but it's low-level, harder to use, and it costs $1000.  People tend to use this when they want to debug something like a device driver, or make a patch for Windows.

Here's the executable I wrote for this sample: SimpleCode.exe (44 KB) and if you feel like cheating, the source code: SimpleCode.cpp.txt (1.28 KB).  It's very simple.  In fact, the whole purpose is to validate the user code -- there's no real content that's protected.  However, it will be enough to learn from.  Also note that it only runs on Windows 2000 and above.  If you aren't using that OS, upgrade :), or get the code and fix it, or email me for a version you can use.

So, let's open OllyDbg and make sure the analysis options are on (Alt-O, check all of them out).  Now, load SimpleCode.exe.  OllyDbg loads and disassembles the code.  You now have a console window open, and a bunch of x86 on your screen.  Let's run through the program (F9).  Enter 4 chars for your serial, and 4 for your activation code (no checking is done, so you'll screw up the program if you enter more data).  A message box appears telling us the code is invalid:


That's our way in, for this example.  We know that somewhere before the message box was shown, our activation code was tested.  So, let's go breakpoint at the message box.  Restart SimpleCode (Ctrl-F2).  Right click in the main window and select Search for -> All intermodule calls.  In the new window, type MessageBox.  You'll see two calls to MessageBoxA.  A real program would have many more.  Right click one of the calls and select “Set breakpoint on every call to MessageBoxA”. 


Run the program and enter fake serial/activation again.  The program breaks at “00401163  |.  FF15 DC804000 CALL DWORD PTR DS:[<&USER32.MessageBoxA>>; \MessageBoxA“.  If we look up a bit, we can see that the arguments loaded are for the invalid serial.  This is the message box we want.  Go into breakpoints (Alt-B) and disable both breakpoints.  Now, the opcode right after the MessageBoxA call is C3, RETN, the end of the function.  Considering the code for this function is very short (21 lines), it should contain only the “bad” code -- code we don't want executing.  Press F8 to step over that call.  Dismiss the message box.  Notice you can press “;“ to add comments to lines.  It'd be good to mark this line with something like “Return from displaying bad message box.“, just in case we get lost later on.  In many programs, there will be many interesting points, so good commenting is key.


If you're going to be doing real attacking, you need to learn some X86.  Important things are CALL, RETN, the various jumps, and comparisons.  Because most likely, somewhere inside your target program, a check is performed and then a corresponding action is taken.  If we can reverse the logic, then we can make the program think correct data was entered when it wasn't (and the opposite: correct data will be considered incorrect).

Now we're about to return to the point that called this function.  Press F7 to see where that takes us.  Now we're on “00401274  |.  8B4C24 3C     MOV ECX,DWORD PTR SS:[ESP+3C]“.  The line above that is the callsite of the “bad display function“.  Comment it as such.  Look around.  OllyDbg should display some arrows indicating jumps and targets.  If it doesn't go into debugging options and check your settings. 

Notice that the callsite of the bad function is a jump target from “00401259  |. /74 14         JE SHORT SimpleCo.0040126F“.  If we take the jump, we end up calling the bad function.  If we don't we RETN (look at the line right above the bad callsite).  Sounds interesting.  Set a breakpoint on that JE instruction, restart, run and enter the data.


JE means “jump if equal“.  It's opposite is JNE (jump if not equal).  Our program is stopped right now at a JE, and OllyDbg says the jump will be taken.  Since the jump goes someplace bad, we don't want it to happen.  Press space.  This opens the reassembler.  Change the JE to JNE and press assemble.  OllyDbg patches the in-memory executable. 


Let's see what happens.  If we're lucky, this will call the “good“ code.  If not, we just patched something else and the program at the best is going to do something strange, but most likely will crash and burn.  Press F9.



What's that?  Thanks for activating?  Why, you're quite welcome!  That jump did it.  Wasn't that easy?  And we didn't have to learn much X86 at all.  To save your changes, we'll need to restart (OllyDbg will complain since the breakpoint code was patched and changed) and goto the breakpoint and re-patch.  This time, right click and select “Copy to executable -> All modifications”.  Now we've got a patched program.

This was extremely easy (it was a very simple program!), and just demonstrates one way that someone could attack your code.  It's also an inflexible attack (a binary patch, versus finding the algorithm), so if a new version is released, we need to debug and patch it again.  Hope you learned something!

Update 2004-3-8: Part 2 now available.

Code | Security
Tuesday, March 02, 2004 7:20:18 PM UTC  #    Comments [9]  |  Trackback

Cracking code - Introduction

To defend, you must have some idea of what you're defending, and who and what you're defending against, specifically, which attacks.  Failure do understand and know these things means that your defense will most likely not be effective, and could in fact decrease your security.  Here's an example:

Near where I live, thieves were stealing cars that people parked in the street.  The neighbourhood committee decided that they'd stop this.  The solution they implemented was to put gates at all entrances and exits of their area, and have guards that only allow cars with a particular sticker get through.  This makes people FEEL more secure.  However, for the cost (guardhouses and gates construction, guard salaries), it's not as effective as it could be.  A thief can still walk in just as easily (gates only block roads), and when driving a stolen car out, the guards will see the car and sticker, recognize it, and let them leave.  If they had thought about how thieves operated, then they would have realised this and done something more effective, perhaps hiring the same number of guards, but setting them on a patrol, instead of just sitting at their posts.  With unlimited resources, they could do both things, and give each member a special remote key-code to unlock the gate when they are driving.  However, the tradeoff in cost and convenience is too high for them.

This is how security is, in the physical and electronic worlds.  We have many possibilities, each with their tradeoffs.  Deciding which measures to implement requires us to understand how our opponent is going to operate, as well as the details of how exactly our defenses work.

In this series, I'm going to show you how to crack simple code.  I'm going to make a series of samples to try this out on (to avoid DMCA problems with real code), so as to get a feel of what crackers do to code.  It is not going to be in-depth or show how to become a master cracker.  Just enough so that we could attack a simple Windows/.NET program's licensing key system, which is a common theme in software protection.

Continue to Part 1, where we'll crack some simple code...

Code | Security
Tuesday, March 02, 2004 5:26:40 PM UTC  #    Comments [5]  |  Trackback

# Monday, March 01, 2004
Processing HTML into safe HTML with .NET - Part 3

Now that I've decided on which library to use, I'll describe the actual code.

We already know that HTML, esp. in Internet Explorer, provides many attack vectors.  And new versions of the browser could add another tag or attribute that can execute code.  So we need to use a whitelist, not a blacklist.

Next, there are many more legit users than attackers.  So when dangerous content is detected, it needs to be removed -- we can't just blow up and tell the user not to hack us.  The number of false positives could actually be rather high, since some people are going to use Word and end up with a lot of tags and who knows what else.  And finally, users could accidentally paste something that's potentially dangerous.  Yelling at them, or even telling them to fix their code isn't going to work, since they're maybe not even aware that HTML exists.

So, here's the code:
SafeHtml.cs.txt (3.28 KB).  It's very short and easy, thanks to the HtmlAgilityPack.  The processing of style tags is pretty weak (simple replacements), but should do the trick.  Enjoy!

Update 2004-Mar-04: Forgot to handle <A href=”scriptType:code...”>.  Be sure to add that if you use this code in production.

Code | Security
Monday, March 01, 2004 7:14:24 PM UTC  #    Comments [0]  |  Trackback

Processing HTML into safe HTML with .NET - Part 2

Following up from part 1, I reviewed three different libraries:

HTMLDocument is a commercial component ($249 per dev, inc. source code).  The other two are libraries written by some cool people at Microsoft and include source code.

SgmlReader is basically an XmlReader that can handle HTML.  To write, we need to use an XmlWriter, and that can mess up the HTML, and we don't want that.  SgmlReader seems like it'd be ok if all we wanted to do is determine if there's unsafe content and then return false, but that's not what we need.

However, both HtmlAgilityPack and HTMLDocument read HTML and create a DOM out of it, allowing you to modify it and write the HTML back out.  This is what we need.  I briefly looked over both libraries to see which one I want to program against.  I gave them both an equal rating to start off with, but the scales rapidly tipped in favour of one library.

HTMLDocument definately loses as far as API niceness and robustness.  Some problems:
  • Inconsistency when loading data into the HtmlDocument.  If you have a string, it needs to go in the constructor, otherwise, use an instance method.
  • Enums (both of them) are prefixed with “e”.  Why?
  • Lack of types.  There are four types total.  That's all.  No HtmlAttribute.  No HtmlElementCollection.  Nothing like that.
  • Weak-typed collections.  ArrayLists and HashTables are used as the collections, instead of strongly-typed collections.  So you must cast, and if you insert an unsupported object, then it will throw an exception  when writing the HTML.  Not very robust.
  • And the silliest thing of all: No encoding support.  Worse than that, FORCED ASCII.  If you open a file, their code opens a stream, manually passing ASCII encoding.  No BOM detection, no system default, just ASCII.  Ouch.
These things made me seriously doubt how professional a library HTMLDocument is.  Most of these things are ultra-simple to fix.  If I was forced to use this, I'd have to buy the source code just to make it right.  It seems like it's purpose is to demonstrate how not to construct a class library.

What's more is that HtmlAgilityPack doesn't have any of these flaws.  In fact, it seems like it's actually a missing piece of the base class libraries.  Superbly done.  Writing code against it was so easy and natural.  I'm extremely impressed.  Even the documentation is much more complete (it comes with a 180KB HTML Help file, compared to HTMLDocument's 36KB HTML Help file).

Hands-down-winner: HtmlAgilityPack.
Code | Security
Monday, March 01, 2004 6:54:09 PM UTC  #    Comments [0]  |  Trackback

# Sunday, February 29, 2004
Learning MSIL

I was going to write a series about learning MSIL (Microsoft Intermediate Language, or simply “IL“), and then get into more advanced topics.  However, I found a good tutorial (and no doubt there's more if I use Google for am minute) at CodeGuru, called MSIL Tutorial.  It should be enough to get people up to some speed.

I'll be writing some articles about how people actually attack programs, starting with nice x86 assembler, and then showing how attacks against .NET programs can use many of the same vectors.  I'll show how, even with some weak obfuscation (and by weak I mean pretty much every product currently available), crackers still have an easier time on .NET than on native x86/Win32.  Then I'll talk about some mitigation techniques that can be used to make things somewhat harder.

Code | Security | IL
Sunday, February 29, 2004 5:43:53 PM UTC  #    Comments [2]  |  Trackback

Remoting and multithreading

The other day I got an interesting email.  A client who we had written a payment processing system was having trouble with MQSeries (shudder), and was pinning the blame on our system.  The issue was that when MQSeries dispatched a message that eventually timed out (30 seconds), the payment server blocked until the timeout was returned.

At first, we thought that MQSeries was to blame.  After all, it's a most annoying piece of software (it starts up around 30 processes for some reason).  There's a reason that IBM's consulting division makes so much money :).  But we thought that serializing all connections was a bit bad, even for IBM.

Remoting seemed unlikely.  After all, how could anything be scalable if remoting used only one thread?  After some tests, we found out the cause.

Apparently there is a bug in remoting.  Or perhaps it's by design.  The result is that it appears as if remoting tries to keep one and only one thread per CPU active.  This could be a performance benefit, assuming your thread is using the CPU.  However, when your thread blocks, for instance, calling Thread.Sleep for more than a few hundred miliseconds, or calling WaitOne with an indefinite timout, remoting releases a new thread.  This is actually a decent scenario for most things, since it assures your CPUs are operating at the highest efficiency.

The problem was the MQSeries was being called via a MC++ interop library.  (IBM didn't have a .NET library when we wrote this, and apparently their new .NET library for MQSeries is pretty bad.)  Since it's unknown what happens inside of a P/Invoke request, no thread is released.

The ideal workaround would be to let remoting know that a new thread should be released.  However, I'm unsure of how to do this (or if it's even possible), and thus, the workaround is to manually multithread your server-side code where needed.

Code
Sunday, February 29, 2004 3:07:31 PM UTC  #    Comments [0]  |  Trackback

Processing HTML into safe HTML with .NET - Part 1

In an application I'm currently writing, we allow users to write messages with HTML markup in them, to deliver a rich experience.  The obvious problem is making this secure.  We don't want UserA to write a malicious script and steal some of UserB's data.  IE provides some cross-site scripting defense, but defense-in-depth (well, not even that deep in this case) would want to make us ensure that the HTML doesn't contain anything executable.  I've seen some samples that claim to clean the HTML with not much code at all.  They check a few tags, and they think they're done.  Of course, they aren't.

The problem is that IE is extremely powerful.  While this is great when developing an intranet application, it makes finding all the attack vectors nearly impossible.  For instance, we might think that a style attribute is ok, right?  Wrong.  There are two problems that I can think of (without thinking too hard).  First, someone could use styles to “overwrite” links on the page by using absolute positioning.  They could then change the “My Account” link into a link that goes to their own server, and steal the user's information.  Second, the style attribute can be used to load an HTML Component (.HTC).  This can contain lots of script.  That's bad.  And this is just in one little attribute!

Needless to say, there are many, many more attack vectors.  Even if we could find them all, that doesn't help users when they get a new browser with upgraded and different capabilities.  So, we're going to have to resort to a “safe” HTML subset.  We'll go though the MSDN reference and pick out the tags and attributes that we consider safe, and anything else will simply get deleted.

Sounds easy enough, except we've got to parse the HTML.  Not fun.  Fortunately, I've found two libraries that do this.  The HtmlAgilityPack, written in C# by Simon Mourier from Microsoft (source included), and DevComponents.com's HTMLDocument, a commercial but inexpensive library.  If anyone knows of other HTML parsing libraries, please leave a comment.  In part 2, I'm going to review the APIs of the different libraries.

Code | Security
Sunday, February 29, 2004 4:45:27 AM UTC  #    Comments [2]  |  Trackback

# Thursday, February 12, 2004
Abstract thinking and colours

My earlier post about thinking abstractly in relation to language to text works because when a user is put through something, and must deal with it for a bit, hopefully they will be more sensitive to others who might deal with a circumstance all the time.

Case-in-point:  Colours.  Why is it that so many developers just ASSUME I'm going to use the standard Windows colour scheme, and then decide that using system colours or transparent colour is too much work, and that they'll just set it to White, since it works?

While talking about background colours in VS.NET, I remembered that this of course applies to many applications.  In fact, a while back, I tried to switch the background text color to a nice gray.  I found out that my system looked like crap, since over half of the apps I use don't play nice.  Some are unreadable, others hurt a LOT to read.  I think it was a version of some CD burning software that decided to use Red (FF0000) for some text.  Red next to the gray I used turned out to be an optical illusion of pain.

Websites have the same problem too, although most of them have the inverse problem.  The designers want the background to be white, and rely on the default.  This is NOT necessarily a bad thing.  If I set my colour scheme for a dark background, I'd enjoy reading/writing text on a site with a dark background (all the white on my own site is starting to annoy me...).

Eventually, I ended up going back to a white background, painful as it is.  But, it's been a while, and perhaps devs are smarter now?  I'm going to go switch now and see how things work out.

Code | Misc. Technology
Thursday, February 12, 2004 3:21:20 PM UTC  #    Comments [0]  |  Trackback

Some colour tips for Visual Studio .NET

One thing I don't understand is why VS.NET ships with no color coding for strings.  It's right there in the options.  But, it's left as automatic.  Considering how much strings are used in .NET coding, I'd think they'd warrant a bit more attention.  I set my string color to Maroon.  It's dark so it doesn't stick out too much, but just enough to let me know where character and string data are. 

When writing  of code (esp. when mixing string literals with code, as I am now for outputting dynamic JScript to web pages), this helps me catch a lot of errors that I'd normally find at syntax checking or compile time.  When scanning through to make a change somewhere, the string data sticks out enough that I can easily find a section.  I also know explicitly where I'm passing strings around (and thus can find places that might have a refactor possibility).

For those of you who haven't, go into VS.NET Tools -> Options -> Environment -> Fonts and Colors.  Go change your string colour to maroon and see if you like it.

My second tip is against eye strain.  By default, you have a white background.  That's fine if you deal with paper all the time, and thus most text is dark on light.  However, if you're like many programmers, time spent on paper during the day (reading programming books in bed doesn't count) is significantly less than time on-screen.  Thus, you can benefit by changing text to be dark on light, or in my case, dark on not-as-dark.

What I've done is change my text background to gray (specifically 205, 205, 205).  It's light enough that the standard text colours work, but it's dark enough that there is a significant reduction in light output from my monitor.  At first it's a bit odd, but quickly you start to feel more comfortable.  Naturally, there's less strain on your eyes, since there is less energy going in.  This may be one of those things that takes a few years (like ergonomic keyboards) before you realise the benefit.  Since eyes are harder to fix than wrists, I'd play it safe and try to reduce strain as much as possible instead of having problems later on. 

Oddly enough, this is one area where most systems have gone backwards.  When I used various versions of BASIC over 14 years ago, white text was the norm.  Heck, even in Turbo C++ I remember not having a white background.  One company that does realise this is discreet*.  All their products have a “charcoal” interface, where everything is dark.  They have a more urgent reason for this, since their products work with video and graphics: your colour perception gets distorted by extra light, thus by keeping the UI dark and as invisible as possible, you don't mix the UI into your colour corrections.

Code | Personal
Thursday, February 12, 2004 2:57:56 PM UTC  #    Comments [1]  |  Trackback

Languages and abstract thought

Something that many programmers have to do, consciously and subconsciously, is think abstractly.  Some have defined intelligence as the ability to think or reason abstractly.  Abstraction occurs from specification design, all the way to the actual code construction.

I bet many of us have run into some kind of problem in a program where we realise that perhaps one set of data was incorrectly or unnecessarily related to another.  Sometimes the reasons for this are related to a lack of understanding of the data that's being dealt with, sometimes it's just oversight.

Something I see happening all the time is the first problem: lack of understanding.  This presents itself very often as text encoding problems: “I just want the standard 8-bit ASCII!” is heard often.  The easy solution is giving someone a quick primer in Unicode and different encodings.

However, if someone grew up in English, and only uses English, their thoughts regarding the abstraction of language versus text can be quite limited.  Perhaps they took a year or two of Spanish or other similar language, so they know that grammar structures can change around.  But even with Western European languages, the relation of written versus spoken language is somewhat similar -- at least there is a letter-based alphabet.

I think it should be mandatory for students to learn another alphabet.  It's not needed that they understand a language behind it.  Simply writing English in a foreign script can be a great mental excercise.  Abstracting written language from alphabets is a good thing to know of. 

Also, I believe that anyone learning another script or language should do so not only on paper, but use a computer with different inputs configured.  Being able to read and write isn't too useful when you're stuck on a computer and you don't know how to use the IME.  I can't remember when I used a pen last (except my digitizer).  And who is going to have paper pen-pals?  Nowadays, it's easier and more fun to get online IM-pals or email-pals.

A simple example is my Chinese Hangman program.  In Hangman, I'd be tempted to take the incoming keystroke and add that to the guess -- one letter at a time, just like the paper game.  In concept, that works fine for Chinese -- a one character guess.  In practise, the problem is that to get that character, many keystrokes or perhaps even characters could be written.  For me, I use the Korean word 과일 (Gwa-il) and then convert to Hanja (Chinese characters).  My keystrokes are: [Right Alt][r][h][k][Right Ctrl][2].  The right alt switches to Hangeul, rhk are: ㄱ ㅗ ㅏ, which combine to form 과.  Right control tells the IME to list Chinese characters for words with the current syllable, and 2 is the number from the list that corresponds to fruit.  The end result: 果.  Note to everyone who is trying to grab control keys and stop their normal usage for some funky functionality in their own app: You're screwing with someone's input in a very annoying way.

In less two weeks, someone can learn a simple phonetic alphabet and how to use an IME.  At least well enough to type a few simple things in, and get a feel for how input might be entered.  However, the lessons learned are going to be there adding another automatic “what if...” case while coding or designing, and hopefully avoid some flaw.

Code | Personal
Thursday, February 12, 2004 2:38:37 PM UTC  #    Comments [0]  |  Trackback

# Thursday, February 05, 2004
Finalize and Dispose: Performance

In .NET, some people have found Dispose and Finalize to be a bit confusing, especially in regards to when you need to implement Finalize, and when to Dispose.

To quickly summarize, finalizers in .NET provide a way for the runtime to clean up unmanaged resources that the Garbage Collector knows nothing about.  By providing a finalizer on a class, the runtime puts the object on the finalize queue instead of collecting it the first time around.  Only after the object's finalizer runs can the GC truly free the memory.

The impact of this is that objects that are left to finalize increase memory pressure, since they stick around longer and require a deeper GC to collect.  Obviously this is not really good for performance.

Dispose comes in to the rescue by providing a common way for an object to be deterministically finalized, in the .NET sense (the managed memory is still only freed after the GC).  In a class implementation, Dispose does the real work of cleaning up resources, while Finalize simply calls Dispose. 

However, Dispose can also call the Dispose method of other IDisposable classes.  It's possible that a class that has no direct unmanaged resources might actually be using some unmanaged resources.  Thus, it's important that you call Dispose on every object that supports it, since you never know what the implementation might be.

How big a deal is this?  I wrote a small benchmark that creates a new finalizable object, does some work, and then either lets it fall out of scope, or calls dispose.  It repeats this 10,000,000 times.  The work done is generating a new random number (using a System.Random instance, which is created before the loop).  The finalizer calls dispose, which does nothing (just trying to get the overhead of finalization).

My test machine is a Pentium 4c (HyperThreaded) at 3GHz (533MHz FSB), with 1.5GB of syncronous DDR333 RAM.  When not calling dispose, the average run time is 6.48 seconds.  When calling dispose, the average run time is 3.17.  Calling dispose makes this over twice as fast.  View the code: FinalizeBenchmark.cs.txt (2.08 KB).

Bottom line: Always call dispose if you can, even if the class doesn't have unmanaged resources or a finalizer.

Code
Thursday, February 05, 2004 10:25:20 PM UTC  #    Comments [1]  |  Trackback

# Tuesday, January 13, 2004
Base32 in .NET
I haven't seen any .NET Base32 implementations, but various people have expressed interest in having some simpler way to represent binary data (such as an encrypted keycode).  So, I'm posting a sample Base32 encoding.  Note that this does not conform to the Base32 standard encoding, but uses it's own set of characters (useful for keycodes, where we don't want to have to differentiate between 0 and O.  Thanks to Juan Gabriel for making the code much better :).

Update 2004-2-5: Thanks to Philippe Cheng for fixing a bug that caused extra (harmless) output. (See comments for details).

using System;
using System.Text;
 
public sealed class Base32 {
 
      // the valid chars for the encoding
      private static string ValidChars = "QAZ2WSX3" + "EDC4RFV5" + "TGB6YHN7" + "UJM8K9LP";
 
      /// <summary>
      /// Converts an array of bytes to a Base32-k string.
      /// </summary>
      public static string ToBase32String(byte[] bytes) {
            StringBuilder sb = new StringBuilder();         // holds the base32 chars
            byte index;
            int hi = 5;
            int currentByte = 0;
 
            while (currentByte < bytes.Length) {
                  // do we need to use the next byte?
                  if (hi > 8) {
                        // get the last piece from the current byte, shift it to the right
                        // and increment the byte counter
                        index = (byte)(bytes[currentByte++] >> (hi - 5));
                        if (currentByte != bytes.Length) {
                              // if we are not at the end, get the first piece from
                              // the next byte, clear it and shift it to the left
                              index = (byte)(((byte)(bytes[currentByte] << (16 - hi)) >> 3) | index);
                        }
 
                        hi -= 3;
                  } else if(hi == 8) {
                        index = (byte)(bytes[currentByte++] >> 3);
                        hi -= 3;
                  } else {
 
                        // simply get the stuff from the current byte
                        index = (byte)((byte)(bytes[currentByte] << (8 - hi)) >> 3);
                        hi += 5;
                  }
 
                  sb.Append(ValidChars[index]);
            }
 
            return sb.ToString();
      }
 
 
      /// <summary>
      /// Converts a Base32-k string into an array of bytes.
      /// </summary>
      /// <exception cref="System.ArgumentException">
      /// Input string <paramref name="s">s</paramref> contains invalid Base32-k characters.
      /// </exception>
      public static byte[] FromBase32String(string str) {
            int numBytes = str.Length * 5 / 8;
            byte[] bytes = new Byte[numBytes];
 
            // all UPPERCASE chars
            str = str.ToUpper();
 
            int bit_buffer;
            int currentCharIndex;
            int bits_in_buffer;
 
            if (str.Length < 3) {
                  bytes[0] = (byte)(ValidChars.IndexOf(str[0]) | ValidChars.IndexOf(str[1]) << 5);
                  return bytes;
            }
 
            bit_buffer = (ValidChars.IndexOf(str[0]) | ValidChars.IndexOf(str[1]) << 5);
            bits_in_buffer = 10;
            currentCharIndex = 2;
            for (int i = 0; i < bytes.Length; i++) {
                  bytes[i] = (byte)bit_buffer;
                  bit_buffer >>= 8;
                  bits_in_buffer -= 8;
                  while (bits_in_buffer < 8 && currentCharIndex < str.Length) {
                        bit_buffer |= ValidChars.IndexOf(str[currentCharIndex++]) << bits_in_buffer;
                        bits_in_buffer += 5;
                  }
            }
 
            return bytes;
      }
}

 
Code | Security
Tuesday, January 13, 2004 1:22:46 PM UTC  #    Comments [4]  |  Trackback