How spam may feed the thinking machine

commentary It is hard to find a good word to say for spam. Incoherent, unpleasant and unwanted, it slimes through cyberspace on the backs of zombies and oozes into our inbox with the stench of month-old haddock.

Yet far from fatally clogging up our information arteries, spam may provide the impetus for a true revolution in information technology -- one we've been expecting for more than fifty years.

All the problems caused by the stuff can be solved if we can answer one simple question: what is spam? You and I know within a second of opening a piece of e-mail whether it's spam or not - but computers are terribly bad at replicating the task. All spam-filters suffer from two problems, the false negative and the false positive. We can -- we do -- put up with the false negatives, the spam written cleverly enough to bypass whichever tests are flavour of the month.

False positives, when a real e-mail is junked before we read it, are potentially ruinous. Unless filters are absolutely sure, they err on the side of slackness. They are never absolutely sure: some always gets through. And, because spam works on the law of averages, as long as some gets through, the spammers will ramp up the rate to make sure that enough hits to make the sums work. The pressure on our systems is immense.

So what's so hard about spotting spam? By common consent, the first serious spammers were Laurence Canter and Martha Seigel, who started sending out mass postings in 1994 advertising immigration services. At once, the battle was joined: people started writing filters and ditching missives from Canter and Seigel's ISP -- as the only spammers on the planet, they were easy to find. They changed ISP (not entirely voluntarily) and the arms race between spammers and filters had begun.

Since then, spam-filter software has learned -- for example -- that spam looks very similar, so the spammers learned to include different random text in each message. Then the filters found that some fairly simple tests for basic English construction spotted the randomness, so the spammers learned to construct fake English sentences or include snippets of surreally inappropriate text. Key words were a giveaway, so the spammers learned to misspell and punctuate violently.

By now, the whole business resembles a planetwide reverse Turing test. Instead of human arbiters deciding whether their interlocutor is man or machine, uncountable thousands of filtering robots anxiously scan gigabytes of chatter to fish out the spawn of their evil cousins. It turns out that the only way to be sure whether something is spam is to look at it like a human, with all our knowledge of context, language, meaning and intent. In short, you must be truly intelligent to do the job. Suddenly, the mildly moribund field of AI has a real job to do: saving the world.

Evidence of this can be found as far afield as the University of Melbourne, where programmers Matthew Sullivan and Guy Di Mattina, together with mathematics lecturer Dr Kevin Gates, have stapled a Support Vector Machine to an e-mail firewall to get a claimed rate of 90 e-mails a second with one error every 25,000 messages. Support Vector Machines are fearsome mathematical constructs that have only just escaped from the lab. As far as I can make out, they seek non-linear hyperplanes in Hilbert space using Lagrangian transforms - check http://www.kernel-machines.org/ if you don't believe me.

Whatever the details, a SVM looks at data in lots of ways at once - it extends the variables in the data into many dimensions -- and then learns which characteristics mark out members of one set from another. The eponymous support vectors are the dividing lines between the two sets: once the machine has established these, filtering is a matter of finding out which side of the lines the messages fall. Performance is predictable and prone to optimisation: in short, this is one of the most powerful methods of handling real-world data within a computer that has yet been developed.

With spam estimated to be costing tens of billions of dollars worldwide each year, the motivation to develop really effective filtering is intense -- and that's before the fact that whoever defeats the Spam Monster will be crowned God-Emperor for Life and forever be preceded by dusky maidens and/or oiled hunks (delete according to taste) casting rose petals in their path. If the evil of spam leads to a renaissance of well-funded research into fundamental knowledge systems -- nothing else will do -- it could be the final kick we need to create truly intelligent machines. What they say when they find out they're being fed a diet of pure rubbish will be another matter: we'd better get our excuses ready, and fast.

Advertisement

Talkback 5 comments

    I can get rid of 90% of all SP ...Anonymous -- 25/08/04

    I can get rid of 90% of all SPAM today !

    Hotmail, Yahoo Mail, and other such "free" email address providors MUST GO IMMEDIATELY !

    This will get rid of approx ??? Billions of dollars worth of expense overnight.
    This completely unnecessary expense is currently borne by , not only every company in the world with an email server, but individuals as well !

    THE TRUTH IS...Telcos/ISP's , don't really want it to stop, they are making BILLIONS of dollars from the bandwith/time taken to delete & sort through the stuff, as well as the extra MB's users pay for..They don't want it gone, if they truly did, it would be gone tomorrow !

    I have found the spam filterin ...Anonymous -- 25/08/04

    I have found the spam filtering in Outlook 2003 to be uncannily accurate - as much as it galls me to praise Microsoft. And in general, I don't see what a lot of the fuss over spam is about. Sure, it's annoying, but if people use the tools which are available to deal with the problem, it's not as huge an affliction as many people make it out to be. That said, I would like to get my hands on some professional spammers and spend some quality-ime introducing them to the business-end of a baseball bat...

    This is one of the cleverest p ...Anonymous -- 27/08/04

    This is one of the cleverest pieces of writing I have seen in a long time. Congratulations Rupert on a marvelously colourful piece of prose. Keep up the good work.

    Every computer system I sell t ...Anonymous -- 28/08/04

    Every computer system I sell to an individual has anti-spam software installed, as well as anti-virus and firewall software. This is all well and good for my own clients, but what of the other thousands that buy overpriced pieces of junk from the large retailers or off the web (dell)?
    Spam is a problem that simply can't be completely solved by clever software filtering. The proposed approach of Bill Gates, where senders must pay for every email and then have that payment returned when the recipient clicks that the message isn't spam, is probably one of the few ways that spam can be stopped. Make the spammers pay for the crud they send out. At 5c per message, they'd soon give up. Sure, it makes life more difficult for computer users around the world, however many might embrace this if it means eliminating spam from their in-boxes for good.
    Personally, I'm not sure I'd want that, my anti-spam filter (mailfrontier desktop) is working well.

    Why not go to the source of th ...Anonymous -- 30/08/04

    Why not go to the source of the spam carrying a reassuringly heavy baseball bat? Mmmmmmm, just to think of it...

Add your opinion

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

  • Suzanne Tindal Sick of broken tender sites
    Some of the state governments desperately need to invest in more user-friendly tender sites so that looking for information on government tenders doesn't have to be a game of blind man's bluff.
  • Array Cyberwar: What is it good for?
    In this week's episode, Cyberwar. What is Australia's place in the world of digital warfare? What are the implications for the NBN?
  • Array Is wholesale-only backhaul just a pipedream?
    The potential acquisition of Pipe Networks by SP Telemedia has raised the question about whether vertically integrated backhaul providers will mean higher wholesale prices for ISP customers.
  • More blogs »

Tags

Back to top

Featured