30,000 Junk Emails

Today is a big day. In late December 2002, I started collecting junk emails. I’m not quite sure of the actual date nor am I quite sure why I started keeping these junk emails, but for whatever reason I did. And I only kept junk emails that were sent to me, via any of my email accounts (I have several addresses). I never asked for anyone else’s junk nor did I go out of my way to get junk mail. I simply collected the stuff that was sent. And now, three and a half years later, my junk mailbox is full of 30,000 emails which were a waste of my bandwidth and time. Not to mention service providers around the world.

Luckily, however, I’ve been using Mozilla Thunderbird since its inception. And one of Thunderbird’s best features is it’s adaptive junk mail detection. Thunderbird uses Bayesian filtering, which was made popular in part thanks to Paul Graham’s ”A Plan For Spam”. Unlike traditional junk mail filters at the time, which were mostly based on the sender’s email or perhaps on specific words, Bayesian was more mathematical in it’s filtering approach and with proper training could reduce false positives to a miniscule amount. If you’re interested in more information, you should read Paul’s essay but in short, filtering based on the senders email address doesn’t work because the spammers simply make up random names and email addresses. And judging spam based on single words like viagra, for example, doesn’t work because while an advertisement for purchasing the drug is spam, a joke email from a friend about taking viagra is not. Read the original essay though as it’s quite enlightening. And there’s a follow up article also.

Since training Thunderbird as to what I consider junk email, I have yet to have any false positives. Though, to be perfectly honest, I’m so convinced of Thunderbird’s adaptive filter that I no longer check for false positives. So the chance exists that there may be a couple. But I doubt it. I have seen a slight increase in junk emails that don’t get caught recently but I don’t mind because marking them as junk is a single click task. And just like life for me, where my learning never stops, Thunderbird continues to learn too. Even after 30,000 junk emails.

Mon, 08 May 2006 21:14 Posted in

Comment 30,000 Junk Emails


RSS