But that's just what's happened. A programmer called Michal Zalewski wrote a utility to generate chunks of badly formed HTML - stuff that breaks all the rules - and throw it at a selection of browsers. IE swallowed it all and kept on running. Opera, Firefox, Mozilla and Lynx regularly crashed. His report on Bugtraq makes sobering reading: single-handedly, he's found a whole host of potential exploits in a wide selection of browsers. But none in IE -- how has Microsoft got something so right while everyone else is in trouble?
It's obvious that good software has to do things right. A word processor should turn your deathless prose into a neatly laid out, properly spelled document, while a web browser should take a url addressed to www.google.com and send out the right messages to elicit and display that web page. It's not so obvious -- even to programmers who should know better -- that good software needs to do much more than that: it needs to safely reject badly formed input.
It's relatively easy to write software that behaves as it should when it's presented with properly formed input. It's certainly easier to test if you don't trying input that is obviously wrong. However, it's then easy to convince oneself that your software works - it does, but only in the lab. In reality, if you haven't tested for pathologically bad input data you're releasing a creation with no immune system into a world of mutant aggressors.
I found this out many years ago when I wrote PC network code -- a NetBIOS stack, to be precise. It was my first encounter with many of the concepts of networking, and thus very educational. Finding out that the textbook ISO/OSI seven layer model just didn't work with NetBIOS was a culture shock that to this day has left me cynical about top-down standardisation, One True Ways and other IT fundamentalism: finding out that I wasn't very good at writing network code was also enlightening.
However, after a few false starts I managed to get something that worked. You could find files across the network, open and close them, even execute them -- a point I gleefully demonstrated by persuading five workstations to play almost simultaneously one copy of the Monty Python theme tune done up in a little DOS executable. "Very good," said the boss, wincing slightly (Compaq Portable IIs were not known for their musicality). "But what happens if you ask it for an illegal file name?". "Er." I said. He tried it. It crashed. Silence fell, I went back to my code, and started to work through what might happen if it was fed bad data.
It was an epiphany. Just a few bytes wrong in a file name or a data structure, and my code would crash and burn, taking the operating system with it. Sometimes, it would drag a few files on the hard disk into oblivion as it fell cackling into the chasm. It was then that I learned about defensive programming.
It's hard work. You have to stop thinking in terms of 'Here is a document name. Document names are 64 characters long, maximum, and always finish with a zero. So I'll create a buffer of 64 characters and copy the name until I hit zero'. Instead, it's "I'll check each character as I get it and see whether it's allowable in document names. If there's no zero by position 64, I'll have to stop and raise an error. Now, how do I handle the error?". That's a lot more to design, implement and test -- but when you've done it, you'll have closed down an entire class of potential error.
L Crashes are the least of your problems. These days, there are any number of people finding ways to force-feed active maliciousness into your system through a hole in the way it handles bad data, and unless you program defensively you'll be a soft target. Checking your defensiveness is the hardest problem: you won't have thought of all possible inputs, because you can't. One way is to check for the obvious ones - buffer overflows, bad pointers, memory corruption - and then try and throw as much pseudorandom nonsense at your program as possible, logging as you go - there's no point in triggering a bug if you can't reproduce it. Nonetheless, testing is as big a problem as design.
Microsoft understands this, because it got it wrong for so long. Experience is the best induction into defensive programming, that and a fierce determination to get it right. Microsoft can - must - afford to do stringent and exhaustive testing; the open source community can and must use its strengths to this end too. There is no reason good test methodologies and test suites can't be evolved: Zawelski's savage fluff generator has already proved its worth.
But the most important lesson to learn is to think, design and code defensively. Let's hope open source doesn't learn this the hard way.












One would think that real world experience, with IE being the worst thing around in terms of security, counters Microsoft's theoretical 'We are more secure' analysis, right?
Anyway, open source code has been in development for 25 years. While you can't take any group as a homogenous whole, I'm sure that the group knows as much as Microsoft does about defensive coding.