|
|
I was amused to read the following article the other day: “OneCare rises from bottom-place ranking”. The reporter notes that “Microsoft’s anti-virus product OneCare is no longer bottom of the pile when it comes to the tests carried out by an independent anti-virus researcher.”Â
Unfortunately, these are two completely different kinds of tests, so this is kind of like comparing apples and hammers. The February 2007 comparative test by AV-Comparatives is a bulk-detection test. You take a giant pile of malware, turn off all the real-time functionality in a completely updated product, and perform an on-demand scan of the pile. The collection of malware used can span years, but most companies never remove signatures from their definitions, so everyone should be on an equal footing.
By comparison, a retrospective/proactive test, like the one from May cited in the article above takes a very different approach. This test starts with smaller pile of new malware (say, anything received by the reviewer in the last couple of months), and scans that collection with signatures older than the oldest sample in the collection. In theory, this means that the test measures the ability of the products to detect new malware. In other words, these are samples they could not possibly have written signatures for, because they did not exist at the time the signatures were written.
What this means is that you can write signatures that detect everything that exists today, and nothing that comes into being tomorrow. Or vice versa. So our quote is like saying that my car, which failed its safety crash test last week has improved because it completed the quarter mile in less time than someone else. Although it doesn’t mean that both areas haven’t improved, it certainly doesn’t tell you that they have.
This is not meant to take anything away from Microsoft or AV-Comparatives. But we as humans (and especially magazine publishers) tend to like black-and-white answers, and try to make everything fit that mold. Unfortunately, we can make incorrect assumptions when we leap to the wrong conclusions. For example:
Very few tests actually test running malware against real, fully updated security products. This provides the best correlation to real-world performance, but is very labor- and time-consuming to test. As a result, most tests of this nature are run on a small set of malware samples (at most 10 to 20). This means that the performance of any particular vendor might be different tomorrow if the test were run on a different sample set.
Needless to say, all of these techniques are useful and contain important data. In fact, we run all of these kinds of tests in our lab to determine whether we are improving over time, and whether our products meet our quality standards.Â
That being said, from years of experience I can say that higher numbers on tests do not always correlate to improved performance. Optimizing for one kind of result is likely to cause worse performance in other areas, be they the size of definitions, system performance, false-positive rates, removal effectiveness, or supportability.Â
Likewise, reading too much (or too little) into test results can lead to selecting the wrong product for your situation. Here are some links to excellent resources on testing methodology and interpretation:
Antivirus Testing Workshop in Reykjavik
And particularly check the FAQ section of the methodology document located off of the Comparatives tab at http://www.av-comparatives.org/.
|
|