January 07, 2003

I think the idea

I think the idea of a Bayesian filter is a bit too simplistic. It's too automatic. Nothing in the real world is simple, and real intelligence requires more than just magically applying an algorithm that can be described in a few paragraphs.

For example, a good filter could really cut down on false positives on messages that I get from co-workers. But I wouldn't rely on just a simple Bayesian classifier. A good filter can look at the content of the From, Reply-To, Received, and X-Mailer header fields. It's possible that a spammer could guess a From address that is in my address book -- just pick an address from their massive list that has the same domain, which probably indicates someone in the same company. They may even be able to guess the content of the X-Mailer header field, since Outlook is so widely used in businesses. However, they probably couldn't guess the sending host in the Received header field. A very simple filter could intelligently inspect these carefully chosen fields to determine if an email originated from a person who already has an email in one of my mail folders. I expect that this would really cut down on false positives, and would be difficult for spammers to get past.

So, that means I would have a folder for known good incoming email.

But what I am really concerned about is false positives on emails that originate from someone that I have not received from before.

Posted by Doug Sauder at January 7, 2003 10:33 PM