January 20, 2003

Here's my report on

Here's my report on the spam conference. I attended all sessions except the last one. I wanted to hear the lawyer who recently won the case for AOL, but we couldn't stay around for it. I also missed the panel discussion at the end.

Bill Yarzunis talked about CRM114, a filtering language that uses hashes of word combinations with a Bayesian-like classifier. This technique is computationally expensive. My impression was that this is a brute force attempt at solving the problem. You start with an algorithm that you can describe in a few paragraphs, then throw a lot of computational power behind it. See crm114.sourceforge.net.

During Jason Rennie's presentation, an idea popped into my mind: Since spammer's use a very neutral subject line, like "hi", what about a tool that could extract some key information from the content itself that could be presented in a user interface instead of -- or in addition to -- the subject line. Choosing a single phrase out of the content is a difficult problem, and depends on training, which is probably too much to ask from a user. However, perhaps other information could be collected and presented in the mailbox list, such as the host segment of any URL found in the message. Oh yeah, about Jason's presentation: it was his Ph.D. thesis work in progress. What more need I say?

John Graham-Cumming had the most interesting presentation. He is the author/maintainer of PopFile, as spam filter written in Perl. John's presentation was interesting because he described a wealth of techniques spammers use to confuse and get past filters. The most interesting technique he mentioned was using a table with a single letter in each table cell to similate text in a monospaced font. I had these take aways from John's presentation: There are sophisticated tricks that can confuse simple algorithmic classifiers. Therefore, rules are important. And, Use plain text to send business email, so that it has the best possible chance to reach it's recipient.

Paul Graham's presentation was good. My take away: If every user uses a unique filter, then it becomes much harder for spammers to get past the filter. These filters that learn from their users become filters that are unique to each individual. If you are spammer, you have to get past thousands of unique filters, not just a single filter. That's good news for spam fighters. Also notable: Paul declared that any false positive is a bug.

The most entertaining -- and most depressing -- presentation was by Barry Shein, CEO of TheWorld. TheWorld is a large ISP in Boston and surrounding areas. TheWorld also claims to be the first dial-up, commercial ISP. Barry himself seems a bit depressed about how spammers are destroying the business climate for ISPs. The volume of mail they have to handle from spammers is very large. Their subscribers are mad. The ISP often takes the blame for spam problems. As Barry put it: "Spam is the rise of organized crime on the Internet."

Jean-David Ruvini had the most innovative solution. He has an add-on to Microsoft Outlook that learns how you file incoming messages into folders. When you visit messages in your inbox, his tool presents you with a choice of three folders into which you can move the message. The tool learns your preferences over time, based on information in the message.

Eric Raymond talked about deployment issues. Technical people focus on technical solutions. But human-computer interface designers are also necessary to achieve a good spam solution. Is a filter too hard for ordinary people to use?

John Goodman from Microsoft talked about spam filtering techniques developed at Microsoft Research and deployed in the MSN client and server. Microsoft was sued in 1999 by Blue Mountain over their spam filter. According to John, Microsoft has been a bit shy about spam filters as a result. But that has changed now. Users really want spam filtering, and Microsoft will give it to them. John mentioned that Hotmail uses Brightmail filters. He said that the moment Hotmail moved to Brightmail filters the spammers almost instantly reacted to change their techniques. If you are Microsoft, AOL, or some other major player in email, that happens.

We had lunch with the handful of people from AOL. They mentioned, too, that the moment their spam filters change, the techniques of spammers change in response.

Posted by Doug Sauder at January 20, 2003 11:01 AM