December 16, 2004

Global Internet Does Not Equal Internationalized Domain Names

Because domain names are spelled with Latin characters, the WWW is not truly internationalized, right? Casual thinking says yes. But I disagree.

It's not that I don't understand the need of a billion-plus population of China to use familiar names for web sites in their own language. So let me explain a few points that perhaps we ought to consider.

We managed to get by for many decades using phone numbers, which comprise only digits. Phone numbers are not friendly, but they work. The great thing about phone numbers, is that it's easy to give out a phone number to someone else over the phone. Let's call this the "phone test": Can you easily give this "name" or "address" out over the phone?

The Internet introduces new addresses that contain not just digits, but also Latin letters and some punctuation marks. As an added benefit, many Internet addresses are pronounceable. Thus, we can say ebay.com or yahoo.com, and not just spell it. Web addresses, in this regard, appear to be a big improvement over phone numbers.

Or are they?

If you are Amazon.com, or drugstore.com, or the owner of a "nice" domain name, you have good reason to be happy. However, there are millions who will never be able to get a "nice" name. This is especially true for email addresses or other online names such as AOL screen names. Is doug3390 a "nice" name? At some point we start to run out of "nice" names. And many Internet names -- think email addresses -- are not particularly good under the "phone test" criterion.

So, what about internationalized domain names? What about a domain name <Chinese word for "weather">.cn? The Chinese language is not English. In fact, the Chinese language is about as different from English as can possibly be. Note that you do not "spell" Chinese words: you write them. If two Chinese speakers talk on the phone, one cannot "spell" a name for the other one. The reason is simple: there are far too many Chinese characters to assign names to them all. It's this feature of the Chinese language that makes Chinese Internet names difficult. Because you cannot spell Chinese words, that limits useful names to combinations of words that the majority of educated Chinese citizens know how to write.

Who wants friendly WWW names in languages other than English? Marketers. Marketers in China want their web address to be <Chinese company name>.cn. This is especially true of the big companies with names that are household words in China. The average educated Chinese citizen knows how to write that company's name, and a friendly web name makes it easy to get to the web site. For lesser known names, the benefits of friendly Chinese names are more dubious. Consider email addresses: with many millions of Chinese having the name Yau, there can be only one lucky person who gets the address <chinese character for Yau>@hanmail.net. The other 999,999 Yaus might do just as well to have an email address that is all digits: at least it easily passes the "phone test."

My opinion is that we should leave the DNS as it currently is, with names that use Latin characters and Arabic numerals. These characters are universal. Yes, these characters are used daily in Chinese newspapers. Because these characters have names in every language, names constructed from these characters can always be spelled, and they can therefore pass the "phone test." Current Internet names will just work, just like telephone numbers just work. The fact that some Internet names happen to be pronounceable in English and a few other Western languages is a bonus, but nothing more. If allowing Latin characters seems too Western, why not fall back to Arabic numerals, so that Internet addresses become like phone addresses (that is, phone numbers).

There is a better solution to "friendly" names: directory services. RealNames would have been the perfect solution. The Chinese government could take some leadership here and create their own RealNames-like service for their people, just as the US government took the lead in creating the Internet. The directory service could be layered on top of the DNS. The result would be that ordinary Chinese citizens could easily get to popular web sites using friendly names, and the rest of the world could get to the same web sites by URIs that can be spelled in any language.

On the other hand, if we allow Internet names to use Unicode characters, we run the risk of the Internet becoming more provincial. Consider the name <some chinese word>.cn. There is very little chance that a person who does not know Chinese will ever be able to type that name. Keeping the DNS as it currently is keeps the number of characters to a minimum, and maintains a lowest common denominator for the entire world.


For a different viewpoint, news.com has this article: Is the Internet truly global? As a simple rebuttal, let me say that making the Internet and the WWW more friendly to everyday people is a worthy goal. But there is a better way to do it than an overhaul of the DNS. The question really comes down to this: How many levels of indirection make sense? Resolving a DNS name leads to an IP address. DNS names are perhaps friendlier than IP addresses, but DNS names also serve a few entirely different purposes, like redunancy, stability, load sharing, and more. These purposes are at a lower level than "friendliness," and I believe that argues for yet another level of indirection. So, rather than overhaul the DNS, I believe it makes sense to have a friendly name that resolves to a DNS name that resolves to an IP address -- two levels of indirection. I get the feeling that the push for IDNs comes from marketing types who don't have as much concern for technical issues as marketing issues.

If your native language is not English -- especially, if your native language is based on characters other than Latin characters -- I would be interested in hearing your viewpoint on this issue. Write me: doug /dot/ sauder /at/ ieee /dot/ org.

Posted by Doug Sauder at December 16, 2004 08:12 AM