The Complexity of SpamAssassin Scoring

Matt Kettler posted an informative message to the SpamAssassin mailing list today in the thread Re: scoring system and values.... Here's an excerpt:


It's very common for people to have a very deep misunderstanding of how SA scoring works. Most people fall into the trap of over-simplifying the problem, and simply assuming that some rule or another 'must' be a good spam rule, when in fact it's not.



[...]



Questioning the accuracy of the scoring system isn't unreasonable.. but the scoring system is VASTLY more complicated than you can understand in a few hours of study. You need to have a good understanding of how it really works, and just how complicated the balance of the scoring system is before you can make reasonable judgements about accuracy.



You need to realize the SA scoring system is somewhat analogous to curve fitting an equation with 873 variables (there are 873 rules in SA 2.60's 50_scores.cf). This is done as an approximation using a genetic algorithm to evolve a solution, since a direct solution would take too long to compute. Trying to get your mind completely around an equation with that many variables is not possible for most humans, including me, but I've learned to understand and respect how complex the problem is.


This complexity of the SA rules is the reason that I do not use custom SA rules and let the finely honed SA do its thing.
The special processing and delivery that I do is done via Procmail recipes that are run before SA. I describe this in detail on my Reverse Spam Filtering page.

Comments

Post a Comment

You can use some HTML tags, such as <b>, <i>, <a>


 

 

Links to this page

Create a Link

 

 

Each item © Nancy McGough
Each comment © the author of the comment
Deflexion.com web site hosted by DreamHost.com
deflexions powered by del.icio.us · reflexions powered by Blogger
More deflexions & reflexions, & feeds available via the sidebar top & bottom

[link] For bookmarks & links, please use this page's permalink [link]