Logo




Subscribe:
RSS 2.0 | Atom 1.0
Categories:

Sign In


[Giagnocavo]Michael::Write()

# Friday, August 15, 2008
Typing string IDs

I just read these two posts:
http://blogs.msdn.com/simonince/archive/2008/08/15/strongly-typed-primitives.aspx
http://www.thejoyofcode.com/Avoiding_Primitive_Obsession_to_tip_developers_into_the_pit_of_success.aspx

And that reminded me about something we recently did. One system we're working on uses a lot of string identifiers for many different types of objects. There are many, many of these stored and passed around, so keeping things efficient was of high concern. 

The downside of string IDs (really, using any common type as an ID) is that it's legal to pass any primitive of the same type. Strings and integers abound, both as IDs of other classes, as well as general use. So it's not unimaginable that someone could pass the wrong parameter some where. This could lead to runtime crashes or unexpected results (if the ID is actually a real record of another class of object). Finally, using common types for IDs reduces usability. The signature "public void Delete(int id)" leaves a lot to be desired.

We wanted to hit all these issues, in addition to keeping things simple. There are times when untyped data needs to be converted, and this should be easy and clear. We wanted to avoid having to define new types when we had new classes of objects to identify It is also customer-visible code, so C# is used.

Using a reference type was unacceptable, because it'd add at least 12 bytes overhead (I think more on x64). Using a struct fixes this, in addition to dealing with silly nullability issues. [If a type can be null, it should always be explicit. C#'s "references types can be null" makes this hard.]

The end result was quite simple. Wrap a string in a structure so equality and hashing pass through. But, take advantage and remove case/cultural sensitivity (since in many systems, data IDs are not case sensitive). Provide explicit conversions so you can easily convert to and from strings, but never by accident. (If the conversions were implicit, you're back in the starting point.) Finally, add a generic parameter that is never used. The generic parameter gives you distinct types without having to define them. Now the APIs can look like:

    public void Delete(Id<Product> id)...

    Dictionary<Id<Group>, List<Id<User>>> members...

When you do have hardcoded IDs, as the blog entries I mentioned do, you can convert easily: (Id<User>)"Admin". Nulls are treated as empty, all the time (empty may be a valid value anyways).

When a truly optional ID is needed, use nullable types: "Id<Whatever>?". This fully captures how values are handled. This is vastly better than "It's a reference type, so maybe null is allowed. Or maybe null will crash. Empty string might be considered null, or maybe empty string means optional." With explicit nullability, the type system says it all.

The best part is that there should be pretty much no overhead. I'd expect the equality functions to be inlined, and there's no memory overhead, since the struct is simply a string reference.


Here's the class.:
public struct Id<T> : IEquatable<Id<T>>
{
    public Id(string name) {
        this.name = name ?? "";
    }
    readonly string name;
    
    public static explicit operator string(Id<T> x) { return x.name ?? ""; }
    public static explicit operator Id<T>(string s) { return new Id<T>(s); }

    public override bool Equals(object obj) {
        return
            !(obj is Id<T>) ? false :
            ((Id<T>)obj) == this;
    }

    public bool Equals(Id<T> other) {
        return other == this;
    }

    public override int GetHashCode() {
        return (name ?? "").GetHashCode();
    }
    public override string ToString() {
        return name ?? "";
    }
    public static bool operator ==(Id<T> a, Id<T> b) {
        return StringComparer.InvariantCultureIgnoreCase.Compare(a.name, b.name) == 0;
    }
    public static bool operator !=(Id<T> a, Id<T> b) {
        return !(a == b);
    }
}
I'd be interested in seeing a more generic, yet very simple, solution: one that doesn't rely on the underlying type to be string, but still provides all the same functionality. I don't think it's possible, since there's no way to get a generic constraint that'd allow similar handling of "string" and "int?". Additionally, structs can't inherit, so you'd end up using "Id<string, Product>" everywhere, which is far from elegant.
Code
Friday, August 15, 2008 7:39:11 PM UTC  #    Comments [8]  |  Trackback

Saturday, August 16, 2008 6:42:26 AM UTC
So why doesn't C# support typedefs again?
Greg
Saturday, August 16, 2008 8:14:46 AM UTC
Not only that, but even the aliases it has are pretty weak. For instance, you have no way of doing:
using Predicate<T> = Func<T, bool>

But would typedefs really fix anything? I'm not well versed, but I don't know of any language that has type safe type aliases. Not to mention you'd pull in all the underlying members (Substring, on an Id...). F#'s are just aliases:

type 'a CommandThing = string
let b : CommandThing<int> = "x"
let c : CommandThing<float> = b
// This works, since 'a CommandThing is just a string

Discriminated unions do a good job though:

type CommandId =
| CommandId of string
static member Start = CommandId "start"

printfn "%s" CommandId.Start
// This fails; CommandId is not compatible with string

The problem I think is that there's no way to represent this well in IL (a strong type alias). C# doesn't tend to do much that's not easily consumable in other languages. But still, some kinda typedef would be handy...
Monday, August 25, 2008 10:51:09 AM UTC
I wonder why you check name for null with ?? "" everywhere you access it?

The only way to set name is in the constructor, so if the constructor checks for null there should be no possible way for name to be null anywhere.
Monday, August 25, 2008 5:04:40 PM UTC
CLR structs always have a default constructor with no code, i.e., all fields will be uninitialized.
Monday, August 25, 2008 8:02:07 PM UTC
Ah, of course, thanks!
Tuesday, August 26, 2008 11:38:13 AM UTC
As far as i recall String is still reference type and not value type so you have a struct on the stack holding a reference to a string object on the Heap.
Tuesday, August 26, 2008 5:37:23 PM UTC
Wonderful concept. I just implemented it in our project and it captured many errors at build time.
Tuesday, August 26, 2008 6:17:16 PM UTC
Noam: Correct, there's still the string. But my point was that the 'a Id adds no overhead at all. You're no worse off than if you had used string. On a good JIT, you'll still be passing things around in registers, etc.
OpenID
Please login with either your OpenID above, or your details below.
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):

Live Comment Preview