January 28, 2003

Can your cell phone

Can your cell phone impair your vision?. Talking to someone on a phone while driving gives you "tunnel vision" and reduces response time by about 20 percent, according to a new study. [CNET News.com]

This finding is something that always made sense to me. I know from listening to audio books while driving how this works. I could pay attention to the spoken words on the audio tape in "normal" traffic conditions, but when there was a situation where I had to pay closer attention to driving -- because I needed to change lanes in heavy traffic, for example -- that I would "miss" a portion of the tape. In fact, this happened frequently. I could listen to the audio book on tape, but I found that I had to rewind often to get the parts that I missed.

Talking on a cell phone, even on a hands-free device, is not like a normal conversation with some else in the car. A person who is also in the car, especially someone in the front seat, has the same road awareness as the driver. The conversation has the flexibility to adapt to the driving. When there is a cell phone conversation, one party is not present and has no road awareness. The conversation does not adapt, and the driver will "leave" the conversation from time to time to pay attention to the road. But because the other party to the conversation has no road awareness, the driver may be more reluctant to leave the conversation, and I think that is where the tunnel vision problem arises.

Posted by Doug Sauder at 05:42 AM | permalink

January 26, 2003

From the Economist: A

From the Economist: A radical rethink

The idea is to limit copyright terms to 14 years, as they originally were, and in exchange allow the content owners to use technological means (I assume enforced by the government) to prevent illegal copying.

Let's leave aside, for the moment, the fact that there is no good solution for using technology to enforce copyright. (For that reason, this discussion is purely academic.) Such a deal would result in a large amount of content being released to the public domain. It's hard for us to imagine would our world would be like with a large pool of digital content in the public domain. But I think it would be very different, and the public domain content would greatly benefit the public. For one thing, it would mean some significant competition for current content producers. It would be harder to make money by producing junky content. So the result would be much better content. The competition would also drive down prices. And the availability of a wealth of public domain content would result in new creative works that are derivative works.

Posted by Doug Sauder at 05:12 PM | permalink

January 25, 2003

AT&T spam filter loses

AT&T spam filter loses valid e-mail. AT&T WorldNet has to defuse a risky spam-filtering technique introduced only a day ago after subscribers discover they are losing legitimate e-mail. [CNET News.com]

Posted by Doug Sauder at 10:26 AM | permalink

January 24, 2003

I had been thinking

I had been thinking that it's only the moves of the big players -- AOL, MSN, Microsoft, and perhaps a few others -- that really matter in the spam war. Now, I'm having second thoughts about that. Spammers will do whatever it takes to get past any anti-spam technology set up by the big players. They feel they have a right to get past those road blocks. However, a filter set up by an individual to block spam is probably an indication that he isn't going to be persuaded to respond to the spam messages. Will spammers fight so hard to get past those individual's filters? I guess that depends on their goal. If there goal is just to get past filters, then they will do whatever it takes. If their goal is to sell products or services, then can they really be successful by harrassing those they are trying to sell to?

If we accept this argument -- that spammers won't try so hard to get past individuals' filters, but they will try extremely hard to get past ISPs filters -- then we ought to hope that MSN, AOL, and other ISPs don't ever build too sophisticated filters, because that would lead to extremely sophisticated spam. Yes, put all the anti-spam techology into client software, like Outlook Express. Make it anti-spam technology that learns, based on individuals' email usage.

Filters that "learn", and thus become unique to each individual, would be effective, even if implemented at an ISP's servers instead of at the subscriber's PC.

Posted by Doug Sauder at 05:40 PM | permalink

States Eye Online Services

States Eye Online Services for Tax Bounty

Wow. What's most disturbing about this news story is that some of the states might be considering taxing downloadable products like software or other digital content. We heard the argument that tax revenue was being lost because consumers were buying from merchants on the web instead of brick-and-mortal merchants. That argument is a little easier to accept. But arguing for taxing online services and downloadable content is a BIG step. We all know that it's hard to sell digital content because of peer-to-peer networks. And, it's hard to sell subscriptions to content because many consumers are used to getting content for free. So, now the states will step in and make it even harder to succeed as an online business. Taxing is just an easy way out, but it's not always the smart thing to to. If you kill economic growth as a result of a tax policy, then you end up with less tax revenue.

Then, I wonder about the logistics of collecting the taxes. Sure, they want to simplify the procedure for the States. But what are we going to do when every little small country out there in the world insists that we collect their tax for them. And what about those Internet-based businesses that move to some remote part of the world where it would be difficult to police them. That would make competition difficult for their competitors that play by the states rules.

Posted by Doug Sauder at 05:14 PM | permalink

January 20, 2003

Here's my report on

Here's my report on the spam conference. I attended all sessions except the last one. I wanted to hear the lawyer who recently won the case for AOL, but we couldn't stay around for it. I also missed the panel discussion at the end.

Bill Yarzunis talked about CRM114, a filtering language that uses hashes of word combinations with a Bayesian-like classifier. This technique is computationally expensive. My impression was that this is a brute force attempt at solving the problem. You start with an algorithm that you can describe in a few paragraphs, then throw a lot of computational power behind it. See crm114.sourceforge.net.

During Jason Rennie's presentation, an idea popped into my mind: Since spammer's use a very neutral subject line, like "hi", what about a tool that could extract some key information from the content itself that could be presented in a user interface instead of -- or in addition to -- the subject line. Choosing a single phrase out of the content is a difficult problem, and depends on training, which is probably too much to ask from a user. However, perhaps other information could be collected and presented in the mailbox list, such as the host segment of any URL found in the message. Oh yeah, about Jason's presentation: it was his Ph.D. thesis work in progress. What more need I say?

John Graham-Cumming had the most interesting presentation. He is the author/maintainer of PopFile, as spam filter written in Perl. John's presentation was interesting because he described a wealth of techniques spammers use to confuse and get past filters. The most interesting technique he mentioned was using a table with a single letter in each table cell to similate text in a monospaced font. I had these take aways from John's presentation: There are sophisticated tricks that can confuse simple algorithmic classifiers. Therefore, rules are important. And, Use plain text to send business email, so that it has the best possible chance to reach it's recipient.

Paul Graham's presentation was good. My take away: If every user uses a unique filter, then it becomes much harder for spammers to get past the filter. These filters that learn from their users become filters that are unique to each individual. If you are spammer, you have to get past thousands of unique filters, not just a single filter. That's good news for spam fighters. Also notable: Paul declared that any false positive is a bug.

The most entertaining -- and most depressing -- presentation was by Barry Shein, CEO of TheWorld. TheWorld is a large ISP in Boston and surrounding areas. TheWorld also claims to be the first dial-up, commercial ISP. Barry himself seems a bit depressed about how spammers are destroying the business climate for ISPs. The volume of mail they have to handle from spammers is very large. Their subscribers are mad. The ISP often takes the blame for spam problems. As Barry put it: "Spam is the rise of organized crime on the Internet."

Jean-David Ruvini had the most innovative solution. He has an add-on to Microsoft Outlook that learns how you file incoming messages into folders. When you visit messages in your inbox, his tool presents you with a choice of three folders into which you can move the message. The tool learns your preferences over time, based on information in the message.

Eric Raymond talked about deployment issues. Technical people focus on technical solutions. But human-computer interface designers are also necessary to achieve a good spam solution. Is a filter too hard for ordinary people to use?

John Goodman from Microsoft talked about spam filtering techniques developed at Microsoft Research and deployed in the MSN client and server. Microsoft was sued in 1999 by Blue Mountain over their spam filter. According to John, Microsoft has been a bit shy about spam filters as a result. But that has changed now. Users really want spam filtering, and Microsoft will give it to them. John mentioned that Hotmail uses Brightmail filters. He said that the moment Hotmail moved to Brightmail filters the spammers almost instantly reacted to change their techniques. If you are Microsoft, AOL, or some other major player in email, that happens.

We had lunch with the handful of people from AOL. They mentioned, too, that the moment their spam filters change, the techniques of spammers change in response.

Posted by Doug Sauder at 11:01 AM | permalink

The webcasts can be

The webcasts can be viewed. Find the link at www.spamconference.org.

Posted by Doug Sauder at 10:20 AM | permalink

I was at the

I was at the Spam Conference last Friday at MIT. There are a few reports on the conference from around the web. For example, News.com has Building a Better Spam Trap and InfoWorld has Will New Filters Save Us From Spam?.

Posted by Doug Sauder at 10:18 AM | permalink

January 19, 2003

Eric Kidd thinks that

Eric Kidd thinks that Bayesian-like filtering can be applied to create automatic white lists.

I agree. However, Eric uses only the email addresses and domains in the header fields as features. That can be too easily spoofed. If a spammer has a large collection of email addresses, then it's a simple matter to set an originator's email address to one that's from the same organization as the sender's address. There's much more information that can be used. The information in the trace header fields (Received) can be used. The X-Mailer header field could also be used. The idea is to create a profile of each person in the whitelist. That profile should contain information about the MTAs that usually touch the message with Received lines, especially, the first MTA to touch the message, and the MUA that the person uses. Obviously, the profile can't use exact matching, since users may use a different MUA on occasion, or may send from a different MTA when away from the office.

Using the X-Mailer header field is interesting. If a user uses Outlook or Outlook Express, the X-Mailer header field may be useless in automatic whitelisting. However, if a user uses KMail, Mulberry, The Bat or some other less common MUA, then the X-Mailer header is a sure thing.

Posted by Doug Sauder at 09:54 AM | permalink

January 16, 2003

Spam's a problem. But

Spam's a problem. But why don't software vendors provide users the tools they need to fight spam? I'm not talking about filters. I don't think too highly of filters because of the possibility of false positives. But what about plus-aliases? I have two accounts with two different ISPs, and both of these ISPs allow plus-aliases. But in order to use plus-aliases effectively, you must be able to send email using different "From" addresses. From my cursory investigation, neither Outlook nor Outlook Express makes it easy for you to use one POP3 account and use several "From" addresses. I used Eudora for a long time, and it is pretty good in this area. Now I'm using Mozilla, which has some features I like, but doesn't support alternate "From" addresses as well as Eudora does. (Eudora calls them "personalities".

One of my goals for this year is to begin a personal campaign to educate email users everywhere about plus aliases. ISPs should support them. Mail client software should take advantage of them. Web developers should be aware of them. (Many web forms for "registration" reject email addresses that contain a "+" sign.) And users should recognize them an understand them.

Posted by Doug Sauder at 09:34 AM | permalink

January 15, 2003

Dan Gillmor: "Swipe a

Dan Gillmor: "Swipe a CD from a record store and you'll get arrested. But when Congress authorizes the entertainment industry to steal from you -- well, that's the American way." [Scripting News]

Posted by Doug Sauder at 11:58 PM | permalink

I wonder if we

I wonder if we will ever have an XML vocabulary for email. The mark-up should be simple, and separation of structure and presentation should not be a consideration. Email doesn't have structure. But simple presentation styles can make communication more effective. The author should be able to choose font size, color, and style. He should be able to indicate flowed text and preformatted text. He should be able to create numbered and bulleted lists.

Posted by Doug Sauder at 11:49 PM | permalink

I was just looking

I was just looking at the draft for XHTML 2.0. The plans are for XHTML 2.0 to break backwards compatibility with XHTML 1.1. In fact, it seems as though the working group has no concern at all for compatibility, although familiarity seems to be a concern. (What I mean is, they apparently feel that XHTML 2.0 should be sufficiently similar to HTML that writers will find it familiar.)

Overall, I am pleased with the draft. Compatibility is not really so much of an issue if you have cascading style sheets.

The working group seems to be really striving for separation of structure and presentation. That's a good thing. But I hope not too much of a good thing. Take the <i> tag, for example. It can be replaced by <em>. But sometimes you really do need the <i> tag. The titles of books are italicized, but the <em> tag really doesn't apply, because you are not emphasizing the title, just following the convention of italicizing it. Of course you could do this with the <span> tag, but the <span> tag is really just a presentation tag. The most natural thing to do is to invent your own tag, say <btitle>, then use a style sheet rule to indicate that it should be italicized.

Posted by Doug Sauder at 11:43 PM | permalink

Hitting P2P Users Where

Hitting P2P Users Where It Hurts

If spoofing can effectively shut down P2P file trading, maybe copy protection won't be necessary. Then, you could rip a song from a CD and share it with a relative or friend, or easily move it from one device to another, or back it up. Sounds like a good compromise to me: give up large scale trading among strangers, but allow small scale trading among acquaintances.

Posted by Doug Sauder at 10:02 AM | permalink

Wired News: ...the entertainment

Wired News: ...the entertainment industry conglomerates continue to feed an us vs. them, Robin Hood-type reaction among many consumers, especially young people...

Posted by Doug Sauder at 09:53 AM | permalink

Blogs refine enterprise focus

Blogs refine enterprise focus

BUILDING ON THE success of Weblogs for personal Web publishing, enterprises are starting to tap into blogs to streamline specific business processes such as intelligence gathering or to augment traditional content-and knowledge-management technologies.

There is a market for collaborative tools. User interface and security is everything in this market. Make it secure and easy and natural to use.

Posted by Doug Sauder at 09:42 AM | permalink

January 12, 2003

Frankston It's not about

Frankston It's not about price, it's about who is in control -- the users vs. the transport providers.

Posted by Doug Sauder at 09:53 AM | permalink

One of my thoughts

One of my thoughts about VoIP: Perhaps once this technology matures -- in the sense of wide deployment and everyday use -- we will finally break through the 3 KHz bandwidth that has stood from time immemorial in the telecom industry to something with higher fidelity. 44 KHz phone calls, anyone?

Posted by Doug Sauder at 09:43 AM | permalink

Clay Shirky: Customer-Owned Networks

Clay Shirky: Customer-Owned Networks

Posted by Doug Sauder at 09:27 AM | permalink

January 11, 2003

Another approach to solving

Another approach to solving the spam problem: list all new messages in your inbox based on a relevance rating. The idea is this: when you start reading your new mail, you start with the messages that are from co-workers, relatives, and so on, and eventually you get to the messages that look like spam. You might review the first few spam messages -- the ones that may or may not be spam -- then automatically trash the rest of them. This is different from a binary classification system that decides a yes-or-no question "Is this spam?"

Posted by Doug Sauder at 04:26 PM | permalink

The action that a

The action that a spammer tries to achieve from a recipient is key to effective filtering.

What are these actions?

The simplest action is to follow a URL that takes the recipient to a web page. Just as simple, is to reply to the message. Not quite as simple, is to call a phone number. Calling the phone number typically connects you to an answering machine that asks you to leave a number where you may be reached. The spammer calls you back. The least likely action is to send a response by mail. Using the postal system is also unlikely for another reason: because fraud conducted via U.S. mail is a serious offense.

Posted by Doug Sauder at 01:02 PM | permalink

One of the points

One of the points that Jeremy Bowers tries to make, is that once spammers learn how to get past Bayesian-like spam filters, then there is nothing further that we can do. I disagree.

If a spammer's goal is just to get past a spam filter, then I have no doubt that he will find a way to do that, no matter how good the filter is. But a spammer's goal is not just to get past a spam filter, it's to motivate some kind of action from the recipient. In most cases, the desired action is ultimately to get the recipient to buy a product or service. That's why a message like "Here's the link we talked about:", while it will get past every spam filter based on content, may not be a popular form of spam. The response rate may be so low that it can't justify the cost. Therefore, if Bayesian-like spam filters become widely deployed, then we have certainly won a battle against spammers.

But there is still something we can do in the way of filtering. First of all, I think we should find a way to filter based on the IP addresses of the URLs in the message. An IP address is a very small bit of information -- only 4 bytes. A UDP-based server could probably handle a large load of storing IP addresses and responding to queries. Filtering messages based on the IP addresses of the HTTP URLs they contain could be effective.

Keep in mind that even if we don't completely filter every spam message, we can still have a major impact on spam if we just make it difficult to respond to the spam. A URL in a message is just too simple to respond to. If we start filtering based on the URLs in a message, then we can take away that option from spammers.

There are other actions that a spammer can try to motivate, such as replying to a message. There are ways to filter based on the reply-to address, too.

Posted by Doug Sauder at 10:37 AM | permalink

January 07, 2003

Here's an example spam

Here's an example spam message from Jeremy Bowers [jerf.org]:

Subject: Re: Re: the proposal

That's a nice point, but I think you should consider the information 
at http://www.somewebsite.com/info.html before going with that 
approach. I found that information to be really pertinent.

Yes, indeed, this message would get past even the best spam filters that try to interpret the message's content, such as Bayesian-like filters.

On the other hand, this spam email would not have a very good marketing message.

Posted by Doug Sauder at 10:49 PM | permalink

I think the idea

I think the idea of a Bayesian filter is a bit too simplistic. It's too automatic. Nothing in the real world is simple, and real intelligence requires more than just magically applying an algorithm that can be described in a few paragraphs.

For example, a good filter could really cut down on false positives on messages that I get from co-workers. But I wouldn't rely on just a simple Bayesian classifier. A good filter can look at the content of the From, Reply-To, Received, and X-Mailer header fields. It's possible that a spammer could guess a From address that is in my address book -- just pick an address from their massive list that has the same domain, which probably indicates someone in the same company. They may even be able to guess the content of the X-Mailer header field, since Outlook is so widely used in businesses. However, they probably couldn't guess the sending host in the Received header field. A very simple filter could intelligently inspect these carefully chosen fields to determine if an email originated from a person who already has an email in one of my mail folders. I expect that this would really cut down on false positives, and would be difficult for spammers to get past.

So, that means I would have a folder for known good incoming email.

But what I am really concerned about is false positives on emails that originate from someone that I have not received from before.

Posted by Doug Sauder at 10:33 PM | permalink

There's lots of discussion

There's lots of discussion on slashdot about spam filtering, especially Bayesian classifier-based filters.

Posted by Doug Sauder at 10:20 PM | permalink

I think one of

I think one of the important parts of an "Internet education" must be instruction on how to send an email that gets past spam filters. A word of advice: don't put viagra in the subject line and expect it to get through to the intended recipient's inbox. That's too obvious. Slightly less obvious: don't write WITH THE CAPS LOCK ON. I hope we don't get to the point where creating a subject line like "Want to go out for lunch?" means your email ends up in the spam folder.

Posted by Doug Sauder at 10:11 PM | permalink

Spam Filtering's Last Stand

Spam Filtering's Last Stand Good stuff here.

Posted by Doug Sauder at 10:07 PM | permalink

January 06, 2003

Network Associates Buys Deersoft

Network Associates Buys Deersoft [InfoWorld]

Certainly spam is a problem. And where there is a problem, there is money to be made in providing a solution. However, somehow it just doesn't seem right to be making money in providing spam filtering. I feel that way because people shouldn't have to pay to get rid of spam. They are innocent victims.

Imagine this scenario: Five years from now, spam filtering tools are widely used, and millions of email users pay a subscription for spam filtering tools. Then it's easy to measure the cost of spam: X number of email users pay Y number of dollars to fight spam. At that point, when the bean counters start reporting huge numbers as the cost of spam, then the government steps in and passes laws that prohibit spam.

Well, it's not that simple. Laws in the U.S. can't affect spammers who work from overseas.

So, can you create a solid business based on anti-spam tools? Perhaps. I'm considering giving it a try. But there are risks. If spam is a really big problem -- and it is -- open source developers will devote much of their effort to producing anti-spam tools. Perhaps that's how it should be. Innocent email users shouldn't have to pay for getting rid of spam.

Posted by Doug Sauder at 09:26 PM | permalink