23 Jul 11:51
Using SpamAssassin to detect and sort spam.
Pete Stephenson <pete <at> heypete.com>
2003-07-23 09:51:22 GMT
2003-07-23 09:51:22 GMT
Greetings again, In this edition of the SpamCop Digest, I will discuss the use of the free spam filter called SpamAssassin[1] to detect and sort spam. While I believe it's safe to say that all subscribers of this mailing list are indeed quite frustrated about receiving spam, and most report most if not all of the spam they receive via SpamCop. Overall, I believe that anti-spammers will win, and that spam will no longer be tolerated anywhere on the internet. However, while we're waiting for that day, we still have to deal with the huge volume of spam we receive daily (I generally receive anywhere between 50 and 200 spams per day, with it varying randomly every day). Many of us get spam at our work email accounts in addition to our personal accounts, and many businesses are fearful of implementing server-side filtering because they fear it might result in false-positives and blocking of legitimate mail, which, for a business, would be intolerable (and reasonably so). Since employers hesitate to block spam and implement filters, our accounts receive more spam, and more of our work time is being spent dealing with spam[2]. Enter SpamAssassin. In it's native form, it's a free, open-source UNIX program, though it's been ported to Windows and other platforms, and people have created commercial versions[3]. It performs a wide variety of heuristic checks on the headers and body text of mail in order to detect spam. Many filters out there aren't very effective, as they either have inflexible filter rules (i.e. blocking based on a specific sender address, or specific words in the text[4], or generate too many false positives (i.e. requiring that your address be in the TO/CC fields blocks a lot of legitimate mailing lists). SpamAssassin doesn't work that way -- it assigns a "point" value to specific spam-like words and phrases in a message. When the amount of "points" exceeds a user-configurable threshold (in my case, 5 points), the message is sorted to a separate folder for perusal at a later time. For example, a recent spam I got advertising a free golf club received 10.70 points. It received 0.4 points for the word "free" being in the subject, 0.6 points for the "from" address ending in numbers, 0.5 points for asking the reader to "click below!", etc. Certain other criteria, such as the message forging it's sending mail system as being the Internet Mail Service software, when it does not contain other IMS identifying marks earns the message 4.3 points. I left off several other items, as this was merely an example. SpamAssassin is quite effective at detecting spam -- it's filtered nearly 300 spam messages I've received, let about 15 through, and not caught a single legitimate email. I'm using the Eudora plug-in Spamnix[3], which does not currently include the Bayesian filter, which "learns" and adapts to what you determine to be spam or not (and increases the probability of catching spam to above 95%, with the benefit of becoming more accurate the more mail it processes) that the regular version of SpamAssassin contains. SpamCop has been experimenting[5] with using SpamAssassin as an optional filter for paying subscribers. JT's still in the process of testing it out, and it's defaulting "off" on all accounts, though you can turn it on if you wish via the webmail interface. When it's out of testing, and placed fully into production, there'll be an announcement. For those of you who are frustrated with having to wade through gobs of spam in order to pick out only a few legitimate mails, consider using some sort of client-side filter like SpamAssassin to aid you in sorting your mail. This way, you can read your legitimate mail at your lesiure, and deal with your spam when you have sufficient time. There are many filter programs out there, but I've yet to find one nearly as effective as SpamAssassin. Note that I do not suggest using ONLY SpamAssassin -- SA's client-side filtering, combined with either manually reporting or using SpamCop to report your spam makes an effective way of dealing with spam. Also note that I am not associated with SpamCop or SpamAssassin in any way, other than being a satsfied user of SA's product and SC's service. I'm also not compensated in any way. It may be a few week until I next write to the Digest, because I will be vacationing in Ireland from July 25th until August 11th. If there are any anti-spammers in Dublin or the surrounding area, and would like to get a drink or something, let me know, and I'll see if I can work it into my schedule. Cheers! [1] SpamAssassin is available at, unsurprisingly, http://www.spamassassin.org/ [2] Greatly decreasing worker efficiency and productivity, increasing frustration, and costing the company money because they need to pay people to spend time looking through spam, rather than doing productive work. [3] Such as http://www.spamnix.com/, a Mac OS X and Windows Eudora plug-in. [4] For instance, "sex" or "breast". These words can be, and frequently are, used in perfectly legitimate, non-porn-related email. I, for instance, am a medical professional...I may need to email someone regarding "breast cancer", which may be caught by inflexible filters that don't take context into consideration. [5] http://news.spamcop.net/pipermail/spamcop-list/2003-July/049644.html -- -- Pete Stephenson HeyPete.com
RSS Feed