|
|
To print: Select File and then Print from your browser's menu
-------------------------------------------------------------- This story was printed from ZDNet Australia. --------------------------------------------------------------
|
Son of spam: 4 spam filtering packages tested October 24, 2003 URL: http://www.zdnet.com.au/reviews/software/internet/soa/Son-of-spam-4-spam-filtering-packages-tested/0,139023437,120280115,00.htm
Can you trust software to block all the spam your company receives and let all your legitimate e-mail through? We evaluate four top spam filtering packages for their accuracy.Three months ago in the July edition of Technology & Business we compiled an overview of five anti-spam filtering applications that were available at the time. That initial review addressed the introduction and overview of spam and its concepts and also the individual usability and technical implementations of those applications. However it did not look at the actual accuracy of those individual packages in filtering e-mail. We are therefore now re-visiting the anti-spam issue with a more results-based review. We invited the same five vendors back for a head-to-head shootout to show the packages' accuracy in filtering unwanted e-mail while keeping as much useful e-mail as possible. All vendors accepted this challenge except for Clearswift, who cited the imminent release of a new redesigned application. We hope that in the next similar accuracy review, Clearswift will be involved. As you will see in this review, testing these packages for accuracy is a tricky business and to do so fairly and accurately took several months. As detailed in the previous review, anti-spam filters can be set up in any number of ways, utilising black lists, white lists, and custom made rule sets. Some applications come configured with basic rules, others come as a blank slate. Some also employ quite advanced learning techniques (touted by some vendors as heuristics or Bayesian analysis). Not so simple In addition to running tests on this set of static data, we also needed to run the software on some live e-mail data to ensure similar results were achieved by the products, given the static test data may not be filtered exactly the same as it may be in a live environment. In order to do this, each vendor needed to have their own test rig so that the live tests could be run simultaneously. Therefore we needed a domain name, sub domain name records in the name servers, and live public IP addresses etc to setup before the testing could commence. The human factor Sure, from a basic installation and administration point of view the Labs staff could have installed and configure the rule sets for all these applications as they did in the previous review. However, this is a far cry from being an "expert" in each application. It is one thing to do a usability test to ensure that a person with a reasonable level of technical competency can install and configure an application to get it running. That's nothing like the skill of an engineer working for the company, who creates and maintains that application, and knows of the many little nuances and tweaks needed to be applied to achieve the best possible results. Remember, these are not basic antivirus applications that you can just install and download the latest definition file. The rules on many of these filtering systems are highly complex and evolved. This is not particularly different from how it works in the real world, anyway. Because the anti-spam market is very competitive, vendors invest a great deal in keeping their products working efficiently. For instance, some vendors run training courses for your staff on the best ways to configure their product. And for your average medium-to-large installation, it's not at all out of the ordinary to have a technician come in to help you install and configure the product. What we looked for
This ran to some 1800+ items of mail that we sent to each vendor's application. This static test was run through at least twice to ensure accuracy. The second test was a "live" test combining several real world e-mail boxes into one and then splitting that box to each of the anti-spam filtering servers that the vendors had configured. This test ran for over two weeks, and we then took several days worth of collection and manually went through each e-mail that had arrived and sorted it according to its status. This live testing period was useful to ensure that the static testing was doing it job correctly in a controlled environment. Naturally, if any large differences occurred, then that application and the testing methodology would need to come under closer scrutiny to find out where and why the differences had occurred. One would act as basically a validation of the other--but as it turned out there were no discrepancies. Scoring
-2 points for every unwanted spam message allowed through (false negatives) -3 points for every unsolicited newsletter allowed through (false negatives) and -5 points for every legitimate e-mail blocked incorrectly (false positives). The rationale behind this scoring is simple: spam allowed through is an annoyance, but legitimate e-mail blocked can have very serious repercussions. Ironically, it is the false negatives that are more likely to get administrators in trouble--especially if the boss receives a pornographic spam or the like--rather than the false positives, which can be a much more serious matter. But then how are people supposed to know they didn't receive an e-mail if they didn't receive it? While newsletters may be important, we acknowledge that they are more difficult to filter correctly and therefore have less points deducted for improper handling. Live testing Due to the very nature of live testing there are also several variables that could be introduced, which potentially are beyond our control especially the "human" factor with counting and classifying the number of messages. Naturally the live testing could only be run once. Interestingly, the vendors who noted that their applications apply "learning" principles to their filtering did indeed sometimes record different results during the course of the static testing when the same data sets were sent through. However since the captured test data was limited to less than 2000 messages, the variation would not have been sufficient to show any great differences in the test results here. However, this is a good sign that over the course of several months and thousands of messages, these packages may well get better at learning your e-mail pattens and filter better. With that in mind, these applications did not always produce better results when the "smarts" were activated. In a couple of cases, the results went the other way, but only by one or two messages, and we're confident that with a combination of learning and tweaking, you could improve the accuracy of filtering. GFI MailEssentialsThe first of the "smart" applications is the product provided by GFI with its Bayesian engine. Excitingly, when we started the initial static tests and sent the messages through it was only misidentifying one or two spam messages as opposed to the tens that the other applications were letting through. It looked like GFI were going to run away with the gong by miles. That was until we started counting up the false positives, and very scarily the application started canning some legitimate e-mails and not just legitimate newsletters also but normal e-mail messages. So unfortunately, when the scoring system was applied to the results this package took a big hit due to the fact that it blocked more legit e-mails than the other packages. Total score is 383 points. The configuration and monitoring for the GFI application are contained in two separate applications, which is similar to both the NetIQ and SurfControl applications. The admin console is quite logical, and makes a difficult task easier to complete particularly when it comes to configuring quite complex rules.
NetIQ MailMarshalInitially, the results were beginning to look a bit worrying on the static controlled tests and we were very surprised when tabulating the results and applying the scoring to find that even though there were some obvious hits and misses with the MailMarshal application, overall its consistency won out. Instead of being amazingly accurate, this package applies the sure but steady approach to filtering the mail messages. Total score is 1383 points. MailMarshal has two main operations windows: one for configuration, and the other to allow administrators to check the status of the messages being stopped. The messages can be filtered to a number of specific directories such as images, virus/worm, language etc.
Network Associates McAfee SpamKillerCompared to the other three packages in this review, this application performed poorly. It performed some filtering functions, however far too many spam messages were being passed through undetected. On the flipside, very few legitimate e-mails were canned and only a few newsletters. Very little configuration exists for the system also. Total score is -323 points. There is very limited specific configuration available for the McAfee application. This package is more of a set-and-forget app rather than a specific monitored rule-based mail filter. The management/administration console makes it relatively easy to configure settings like adding words to the subject line of messages considered to be spam. Rules and blacklist/whitelists can be readily set up under separate configuration tabs in the console window.
SurfControlSurfControl is the second of two applications tested here that vendors tout as being "smart", and certainly the SurfControl application includes some fairly hefty weaponry in the name of defending the mailboxes. As we found with both "smart" applications, sometimes they were slightly too smart for their own good. While adequately filtering most spam messages, they tended to pick up a fair few legitimate e-mails and legitimate newsletters which impacted on the overall score. Total score is still a respectable 930 points. With both "smart" applications, it would be interesting to see over a longer period if their learning capabilities improved their accuracy. It should also be noted here that since we published the overview article on mail filtering in July, SurfControl has adjusted its pricing and is now within the same ballpark as many of the other players in this field. SurfControl has a very good management window, similar to the NetIQ and GFI offerings. SurfControl has separate management and admin consoles, one for reviewing blocked messages and the other for configuring the rules. Interestingly, the rules can be managed in a specific operator set sequence. For example, the language or header/image rules can be applied before the more encompassing spam rules, thereby allowing better filtering to different categories without the chance of missing more general spam messages.
How we tested
To test participating vendors' anti-spam software applications and their own technical configuration of the same in an effort to provide some statistics on the effectiveness of the current technologies ability to successfully filter desirable and undesirable e-mail messages. How it worked All test servers were connected via the same switch and the Lab gateway to the Internet. Each server was allocated a pre-defined static publicly accessible IP address and each mail server was assigned a fully qualified sub-domain name. Each server was also assigned an external e-mail account. Vendors were encouraged to implement their rule sets so they were "tight to catch as much spam as that package is capable of", but not too tight to block everything--false positives were to be avoided as much as possible. The use of current black and white lists was acceptable. Once the install/configuration period was over, the vendors were not allowed to see or access their systems again. Static tests We used a Microsoft internal testing tool to send the static control messages to the servers. This tool was initially developed to test mail servers under load. We adapted its use to allow us to take messages that we have collected (provided their headers have not been corrupted) and then send with original or new headers. The scores you see are based on the results of these static tests, although they are very similar to the results achieved in the live tests as well. Live tests We used a Linux-based Sendmail server to combine all messages to a single account, then forward them to the multiple test accounts, which left the headers as if the messages had been sent directly from the spammers. After running these two tests using the vendor's suggested configurations, we spent a bit of time altering the vendors' rule configurations to see if tweaking the products could alter their results. Although this did not contribute to the overall scores, because of the subjective and human factors involved, it gave us some valuable information on the ease of use and effectiveness for administrators who will need to constantly tweak the systems once they are in use. A note on results A note on servers Test bench
Futureproofing: Is the software accurate and flexible enough to suit your needs into the future? ROI: Does the price justify the accuracy and will you achieve productivity gains by using the software? Service: What options are available for service and support, and how much do they cost?
Final Words Subscribe now to Australian Technology & Business magazine. About RMIT Test Labs
RMIT IT Test Labs is an independent testing institution based in Melbourne, Victoria, performing IT product testing for clients such as IBM, Coles-Myer, and a wide variety of government bodies. In the Labs' testing for T&B, they are in direct contact with the clients supplying products and the magazine is responsible for the full cost of the testing. The findings are the Labs' own--only the specifications of the products to be tested are provided by the magazine. For more information on RMIT, please contact the Lab Manager, Steven Turvey.
Copyright © 2009 CBS Interactive, a CBS Company. All Rights Reserved. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||