|
|
|
|
 Monday, March 01, 2004
|
Now that I've decided on which library to use, I'll describe the actual code.
We already know that HTML, esp. in Internet Explorer, provides many attack vectors. And new versions of the browser could add another tag or attribute that can execute code. So we need to use a whitelist, not a blacklist.
Next, there are many more legit users than attackers. So when dangerous content is detected, it needs to be removed -- we can't just blow up and tell the user not to hack us. The number of false positives could actually be rather high, since some people are going to use Word and end up with a lot of tags and who knows what else. And finally, users could accidentally paste something that's potentially dangerous. Yelling at them, or even telling them to fix their code isn't going to work, since they're maybe not even aware that HTML exists.
So, here's the code: SafeHtml.cs.txt (3.28 KB). It's very short and easy, thanks to the HtmlAgilityPack. The processing of style tags is pretty weak (simple replacements), but should do the trick. Enjoy!
Update 2004-Mar-04: Forgot to handle <A href=”scriptType:code...”>. Be sure to add that if you use this code in production.
|
|
Code | Security
|
Monday, March 01, 2004 7:14:24 PM UTC
|
Trackback
|
|
Following up from part 1, I reviewed three different libraries:
HTMLDocument is a commercial component ($249 per dev, inc. source code). The other two are libraries written by some cool people at Microsoft and include source code.
SgmlReader is basically an XmlReader that can handle HTML. To write, we need to use an XmlWriter, and that can mess up the HTML, and we don't want that. SgmlReader seems like it'd be ok if all we wanted to do is determine if there's unsafe content and then return false, but that's not what we need.
However, both HtmlAgilityPack and HTMLDocument read HTML and create a DOM out of it, allowing you to modify it and write the HTML back out. This is what we need. I briefly looked over both libraries to see which one I want to program against. I gave them both an equal rating to start off with, but the scales rapidly tipped in favour of one library.
HTMLDocument definately loses as far as API niceness and robustness. Some problems:
- Inconsistency when loading data into the HtmlDocument. If you have a string, it needs to go in the constructor, otherwise, use an instance method.
- Enums (both of them) are prefixed with “e”. Why?
- Lack of types. There are four types total. That's all. No HtmlAttribute. No HtmlElementCollection. Nothing like that.
- Weak-typed collections. ArrayLists and HashTables are used as the collections, instead of strongly-typed collections. So you must cast, and if you insert an unsupported object, then it will throw an exception when writing the HTML. Not very robust.
- And the silliest thing of all: No encoding support. Worse than that, FORCED ASCII. If you open a file, their code opens a stream, manually passing ASCII encoding. No BOM detection, no system default, just ASCII. Ouch.
These things made me seriously doubt how professional a library HTMLDocument is. Most of these things are ultra-simple to fix. If I was forced to use this, I'd have to buy the source code just to make it right. It seems like it's purpose is to demonstrate how not to construct a class library.
What's more is that HtmlAgilityPack doesn't have any of these flaws. In fact, it seems like it's actually a missing piece of the base class libraries. Superbly done. Writing code against it was so easy and natural. I'm extremely impressed. Even the documentation is much more complete (it comes with a 180KB HTML Help file, compared to HTMLDocument's 36KB HTML Help file).
Hands-down-winner: HtmlAgilityPack.
|
|
Code | Security
|
Monday, March 01, 2004 6:54:09 PM UTC
|
Trackback
|
|
Since I'm about to leave Guatemala after living here for over six years, I thought I'd jot down some experiences as to remember them. I'm not making this up.
At a store, my father asked for some soap. He was told that they currently did not have any, and that they wouldn't for two weeks. My father suggested that if they were selling so much soap, perhaps they'd order more. The storekeep smiled and said “Well actually, we never sell soap for the second half of the month. Our records show that we only sell soap for the first two weeks and then none for the rest of the month. So, we only buy for the first two weeks.”
I went to buy a microwave at the biggest Sony distributor in the country. In the front of the store they had a very interesting microwave with some really advanced features. I asked how much it was, and was told that I couldn't buy it, since they didn't have any. When I pointed out to the salesperson that they did, in fact, have one, and it was right there, he said “That's our display unit. If we sold that, we wouldn't be able to show it to other customers.”
|
|
Humour
|
Monday, March 01, 2004 2:29:43 PM UTC
|
Trackback
|
 Sunday, February 29, 2004
|
I was going to write a series about learning MSIL (Microsoft Intermediate Language, or simply “IL“), and then get into more advanced topics. However, I found a good tutorial (and no doubt there's more if I use Google for am minute) at CodeGuru, called MSIL Tutorial. It should be enough to get people up to some speed.
I'll be writing some articles about how people actually attack programs, starting with nice x86 assembler, and then showing how attacks against .NET programs can use many of the same vectors. I'll show how, even with some weak obfuscation (and by weak I mean pretty much every product currently available), crackers still have an easier time on .NET than on native x86/Win32. Then I'll talk about some mitigation techniques that can be used to make things somewhat harder.
|
|
Code | Security | IL
|
Sunday, February 29, 2004 5:43:53 PM UTC
|
Trackback
|
|
One thing about living in Guatemala is that McDonalds has a delivery service. I don't think they do in Canada or the states. I wouldn't usually write this, but they had some awesome service today. My nephew stayed at my house last night, and this morning we called for a Happy Meal. He is collecting the current toy line, so we asked for a specific part.
Well, when the delivery guy showed up, they had the wrong one. I figured we'd go later and change it. A few minutes later, McDonalds calls to apologize for the mistake and invites us to come by to change the toy. An hour later, they show up at my house, just to deliver the new toy and apolgize again. WOW! That'd be impressive in most countries, but it's doubly so in Guatemala, where the concept of customer service is pretty much non-existant.
|
|
Personal
|
Sunday, February 29, 2004 5:32:28 PM UTC
|
Trackback
|
|
The other day I got an interesting email. A client who we had written a payment processing system was having trouble with MQSeries (shudder), and was pinning the blame on our system. The issue was that when MQSeries dispatched a message that eventually timed out (30 seconds), the payment server blocked until the timeout was returned.
At first, we thought that MQSeries was to blame. After all, it's a most annoying piece of software (it starts up around 30 processes for some reason). There's a reason that IBM's consulting division makes so much money :). But we thought that serializing all connections was a bit bad, even for IBM.
Remoting seemed unlikely. After all, how could anything be scalable if remoting used only one thread? After some tests, we found out the cause.
Apparently there is a bug in remoting. Or perhaps it's by design. The result is that it appears as if remoting tries to keep one and only one thread per CPU active. This could be a performance benefit, assuming your thread is using the CPU. However, when your thread blocks, for instance, calling Thread.Sleep for more than a few hundred miliseconds, or calling WaitOne with an indefinite timout, remoting releases a new thread. This is actually a decent scenario for most things, since it assures your CPUs are operating at the highest efficiency.
The problem was the MQSeries was being called via a MC++ interop library. (IBM didn't have a .NET library when we wrote this, and apparently their new .NET library for MQSeries is pretty bad.) Since it's unknown what happens inside of a P/Invoke request, no thread is released.
The ideal workaround would be to let remoting know that a new thread should be released. However, I'm unsure of how to do this (or if it's even possible), and thus, the workaround is to manually multithread your server-side code where needed.
|
|
Code
|
Sunday, February 29, 2004 3:07:31 PM UTC
|
Trackback
|
|
In an application I'm currently writing, we allow users to write messages with HTML markup in them, to deliver a rich experience. The obvious problem is making this secure. We don't want UserA to write a malicious script and steal some of UserB's data. IE provides some cross-site scripting defense, but defense-in-depth (well, not even that deep in this case) would want to make us ensure that the HTML doesn't contain anything executable. I've seen some samples that claim to clean the HTML with not much code at all. They check a few tags, and they think they're done. Of course, they aren't.
The problem is that IE is extremely powerful. While this is great when developing an intranet application, it makes finding all the attack vectors nearly impossible. For instance, we might think that a style attribute is ok, right? Wrong. There are two problems that I can think of (without thinking too hard). First, someone could use styles to “overwrite” links on the page by using absolute positioning. They could then change the “My Account” link into a link that goes to their own server, and steal the user's information. Second, the style attribute can be used to load an HTML Component (.HTC). This can contain lots of script. That's bad. And this is just in one little attribute!
Needless to say, there are many, many more attack vectors. Even if we could find them all, that doesn't help users when they get a new browser with upgraded and different capabilities. So, we're going to have to resort to a “safe” HTML subset. We'll go though the MSDN reference and pick out the tags and attributes that we consider safe, and anything else will simply get deleted.
Sounds easy enough, except we've got to parse the HTML. Not fun. Fortunately, I've found two libraries that do this. The HtmlAgilityPack, written in C# by Simon Mourier from Microsoft (source included), and DevComponents.com's HTMLDocument, a commercial but inexpensive library. If anyone knows of other HTML parsing libraries, please leave a comment. In part 2, I'm going to review the APIs of the different libraries.
|
|
Code | Security
|
Sunday, February 29, 2004 4:45:27 AM UTC
|
Trackback
|
|
Well, for the last two weeks I've been using some shade of gray (205,205,205) as my Window colour (all backgrounds). And for the most part, applications have worked just fine, not like my last experience. Perhaps I need a darker shade, but I'm worried that the reduced contrast will start straining my eyes and negate the benefit of non-white background to begin with.
Of course, I could change the text colours to white, but I really doubt anything would look good then... Anyways, give it a spin! Turn down the amount of energy that your display is emitting and see how it feels.
|
|
Misc. Technology | Personal
|
Sunday, February 29, 2004 4:28:38 AM UTC
|
Trackback
|
|
In the comments for my post, “Some colour tips for Visual Studio .NET“, Michael Carter writes:
“I'm also using Lucida Sans Typewriter as my default font. I think it's much easier to read than Courier. “
Easier than Courier [New]? I had to try. Well, after playing with Lucida Sans Typewriter for about 5 minutes, I found that going back to Courier New was impossible. Thanks for the tip Michael!
|
|
Misc. Technology
|
Sunday, February 29, 2004 4:25:05 AM UTC
|
Trackback
|
 Thursday, February 12, 2004
|
My earlier post about thinking abstractly in relation to language to text works because when a user is put through something, and must deal with it for a bit, hopefully they will be more sensitive to others who might deal with a circumstance all the time.
Case-in-point: Colours. Why is it that so many developers just ASSUME I'm going to use the standard Windows colour scheme, and then decide that using system colours or transparent colour is too much work, and that they'll just set it to White, since it works?
While talking about background colours in VS.NET, I remembered that this of course applies to many applications. In fact, a while back, I tried to switch the background text color to a nice gray. I found out that my system looked like crap, since over half of the apps I use don't play nice. Some are unreadable, others hurt a LOT to read. I think it was a version of some CD burning software that decided to use Red (FF0000) for some text. Red next to the gray I used turned out to be an optical illusion of pain.
Websites have the same problem too, although most of them have the inverse problem. The designers want the background to be white, and rely on the default. This is NOT necessarily a bad thing. If I set my colour scheme for a dark background, I'd enjoy reading/writing text on a site with a dark background (all the white on my own site is starting to annoy me...).
Eventually, I ended up going back to a white background, painful as it is. But, it's been a while, and perhaps devs are smarter now? I'm going to go switch now and see how things work out.
|
|
Code | Misc. Technology
|
Thursday, February 12, 2004 3:21:20 PM UTC
|
Trackback
|
|
One thing I don't understand is why VS.NET ships with no color coding for strings. It's right there in the options. But, it's left as automatic. Considering how much strings are used in .NET coding, I'd think they'd warrant a bit more attention. I set my string color to Maroon. It's dark so it doesn't stick out too much, but just enough to let me know where character and string data are.
When writing of code (esp. when mixing string literals with code, as I am now for outputting dynamic JScript to web pages), this helps me catch a lot of errors that I'd normally find at syntax checking or compile time. When scanning through to make a change somewhere, the string data sticks out enough that I can easily find a section. I also know explicitly where I'm passing strings around (and thus can find places that might have a refactor possibility).
For those of you who haven't, go into VS.NET Tools -> Options -> Environment -> Fonts and Colors. Go change your string colour to maroon and see if you like it.
My second tip is against eye strain. By default, you have a white background. That's fine if you deal with paper all the time, and thus most text is dark on light. However, if you're like many programmers, time spent on paper during the day (reading programming books in bed doesn't count) is significantly less than time on-screen. Thus, you can benefit by changing text to be dark on light, or in my case, dark on not-as-dark.
What I've done is change my text background to gray (specifically 205, 205, 205). It's light enough that the standard text colours work, but it's dark enough that there is a significant reduction in light output from my monitor. At first it's a bit odd, but quickly you start to feel more comfortable. Naturally, there's less strain on your eyes, since there is less energy going in. This may be one of those things that takes a few years (like ergonomic keyboards) before you realise the benefit. Since eyes are harder to fix than wrists, I'd play it safe and try to reduce strain as much as possible instead of having problems later on.
Oddly enough, this is one area where most systems have gone backwards. When I used various versions of BASIC over 14 years ago, white text was the norm. Heck, even in Turbo C++ I remember not having a white background. One company that does realise this is discreet*. All their products have a “charcoal” interface, where everything is dark. They have a more urgent reason for this, since their products work with video and graphics: your colour perception gets distorted by extra light, thus by keeping the UI dark and as invisible as possible, you don't mix the UI into your colour corrections.
|
|
Code | Personal
|
Thursday, February 12, 2004 2:57:56 PM UTC
|
Trackback
|
|
Something that many programmers have to do, consciously and subconsciously, is think abstractly. Some have defined intelligence as the ability to think or reason abstractly. Abstraction occurs from specification design, all the way to the actual code construction.
I bet many of us have run into some kind of problem in a program where we realise that perhaps one set of data was incorrectly or unnecessarily related to another. Sometimes the reasons for this are related to a lack of understanding of the data that's being dealt with, sometimes it's just oversight.
Something I see happening all the time is the first problem: lack of understanding. This presents itself very often as text encoding problems: “I just want the standard 8-bit ASCII!” is heard often. The easy solution is giving someone a quick primer in Unicode and different encodings.
However, if someone grew up in English, and only uses English, their thoughts regarding the abstraction of language versus text can be quite limited. Perhaps they took a year or two of Spanish or other similar language, so they know that grammar structures can change around. But even with Western European languages, the relation of written versus spoken language is somewhat similar -- at least there is a letter-based alphabet.
I think it should be mandatory for students to learn another alphabet. It's not needed that they understand a language behind it. Simply writing English in a foreign script can be a great mental excercise. Abstracting written language from alphabets is a good thing to know of.
Also, I believe that anyone learning another script or language should do so not only on paper, but use a computer with different inputs configured. Being able to read and write isn't too useful when you're stuck on a computer and you don't know how to use the IME. I can't remember when I used a pen last (except my digitizer). And who is going to have paper pen-pals? Nowadays, it's easier and more fun to get online IM-pals or email-pals.
A simple example is my Chinese Hangman program. In Hangman, I'd be tempted to take the incoming keystroke and add that to the guess -- one letter at a time, just like the paper game. In concept, that works fine for Chinese -- a one character guess. In practise, the problem is that to get that character, many keystrokes or perhaps even characters could be written. For me, I use the Korean word 과일 (Gwa-il) and then convert to Hanja (Chinese characters). My keystrokes are: [Right Alt][r][h][k][Right Ctrl][2]. The right alt switches to Hangeul, rhk are: ㄱ ㅗ ㅏ, which combine to form 과. Right control tells the IME to list Chinese characters for words with the current syllable, and 2 is the number from the list that corresponds to fruit. The end result: 果. Note to everyone who is trying to grab control keys and stop their normal usage for some funky functionality in their own app: You're screwing with someone's input in a very annoying way.
In less two weeks, someone can learn a simple phonetic alphabet and how to use an IME. At least well enough to type a few simple things in, and get a feel for how input might be entered. However, the lessons learned are going to be there adding another automatic “what if...” case while coding or designing, and hopefully avoid some flaw.
|
|
Code | Personal
|
Thursday, February 12, 2004 2:38:37 PM UTC
|
Trackback
|
 Wednesday, February 11, 2004
|
As I was driving home this morning, I was thinking about localization issues for different situations. I realised that it must be really hard to have certain word games, like hangman, in some Asian languages like Chinese or Korean (shouldn't be a problem in Japanese, so long you stick to the alphabets). So, without further ado, I sat down and wrote a very simple Chinese Hangman game.
There is only one word because I'm lazy and wrote this in 20 minutes and didn't feel like putting a real word library in. Maybe for v1.1. “Chinese” is somewhat misleading. In fact, I used Korean to get the characters, since I suck at the Chinese IMEs, and didn't feel like using my digitizer.
Since it's written in .NET, you can play online, just by clicking here (32K)!
Update: Some of my slower friends have given me feedback that this game is hard, since you have to guess the exact word. That's the point. The game is just a joke. That's all. Don't actually expect to play it (although it is fully functional!).
|
|
Humour
|
Wednesday, February 11, 2004 4:01:40 PM UTC
|
Trackback
|
 Saturday, February 07, 2004
|
Everytime the TSA is criticised for their silly airport checks, like removing sandals, some bloke comes along and says “Yea, well, airplane security might not be perfect, but can you think of anything better?”
Does anyone realise how ridiculous this is? First, if you think that the way the TSA is approaching things is correct, then, well, it's pretty pathetic to have to ask perfect strangers if they have anything better. Since most people don't know much about security, it's like having a bad designer decorating your house, then, when criticised about the horrible design and colouring outside the lines, your only response is to say “yea, well, YOU go do something better.”
At any rate, there is something better. The security hole exploited on 9/11 was one that allowed cockpit access. It had nothing to do with letting people with weapons on. It's so incredibly easy to get weapons on, that I'd be surprised if anyone with an IQ over 105 couldn't figure out how to get a .22 pistol on board. So, the answer is to plug the security hole (a cockpit access vulnerability), and ensure that even if 10 people come on with nunchaku, .22 pistols and crowbars that they cannot gain control of the plane.
However, what the TSA is doing is similar to not patching a system, yet enacting all sorts of false security measures. For instance, lets say a new blaster variant comes out and attacks Windows machines on ports 135 using a new, unpatchable hole. Since that's somehow related to Windows networking, our fake security advisor says: “Ha! We'll turn off file and print sharing. Yea, it'll annoy everyone and make our network useless since that's what we use the network for. But, we need to be secure!”
Then someone who hasn't been hit by a bus or other any other large, blunt object says “That doesn't solve the problem! You can still be hacked, and that's a useless measure. Stop annoying everyone and actually concentrate on real problems.”
Can you imagine that person being told “Well, maybe not, but at least our CEO feels better, and hey, what's your great idea?” That's pretty much what the DHS and TSA do. “Sure, we don't know crap. But we'll be damned if we're actually going to take any decent suggestions. Now there, please remove your sandals and your watch.”
To the untrained eye, glass can appear as diamond. Thus, to the security-blind, enacting useless fanfare security measures looks to be genuine.
|
|
Security
|
Saturday, February 07, 2004 1:06:33 AM UTC
|
Trackback
|
|
|