Journal Articles

CVu Journal Vol 15, #5 - Oct 2003 + Internet Topics
Browse in : All > Journals > CVu > 155 (10)
All > Topics > Internet (35)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: I Wish They'd Use the Standard

Author: Administrator

Date: 03 October 2003 13:16:00 +01:00 or Fri, 03 October 2003 13:16:00 +01:00

Summary: 

Body: 

I have many Chinese friends, so I have firsthand experience of the poor quality of many Chinese email clients. Email is supposed to be international; it is supposed to enable you to communicate with other countries, not just your own. But many Chinese email programs (and I'm talking about modern, up-to-date versions of such programs, not ancient ones) completely ignore the relevant international standards, making it unnecessarily difficult for their users to communicate with anyone of any other nationality. Their users may like the user interfaces, but they are oblivious to the poor quality of the underlying protocol-handling code, and when problems occur, they don't understand why.

Take FoxMail, for example. It formats dates incorrectly. I downloaded an evaluation copy and looked at the binary with a hex editor. The format string used to format minutes and seconds is %d:%d. What is wrong with that? It means that if a minute or second is less than 10, only one digit will be printed - 3:05 would be printed as 3:5. This could easily have been avoided by using %02d in the format string.

Big deal? Yes, because many spam filters (such as SpamAssassin) use incorrectly formatted dates as a clue that the message is spam. The number of times genuine Chinese messages have ended up in my spam folder is too high for comfort (and I don't like having to read through the hundreds of spams I get just to find them). Of course, I've adapted - I altered the filtering rules to be kinder to FoxMail, and I use BogoFilter in conjunction with SpamAssassin (BogoFilter is based on Bayesian probabilities and you can train it to your own samples of emails, so I get SpamAssassin just to tag emails with its test results and BogoFilter works out the real probability of the mail being spam based on my own sample). But I am a computer scientist. How many people out there are losing important emails because of problems like this?

(If anyone knows how to report a bug to FoxMail's developers, please do so. I can't find their contact info.)

But it gets worse. Recently I was trying to help a Chinese professor with the details of becoming a visiting scholar in the University of Cambridge, and she couldn't read the email that told her what the rent was. There was nothing wrong with the email in question: It was formatted using the ISO-8859-1 (aka Latin-1) character set, and it used the UK Pound sign (code 163), which was MIME-encoded (using "quoted printable") and the headers clearly stated which character set the MIME encoding was using. But the Chinese professor's email client (which didn't even identify itself, but apparently it's being used by Tsinghua University which is widely considered to be China's top university) did the following: It decoded the MIME quoted printable (so we know it's modern enough to understand MIME), but it completely ignored the header's statement of the MIME character set; it assumed that all incoming messages are in the Chinese GB-2312 coding. It then tried to interpret the pound sign and the following byte as a Chinese character; this failed, so it replaced the two bytes with a question-mark. As a result, the first digit of the price was lost. She tried to forward the email for me to read, but the damage had already been done: the client provided no means of forwarding an email without re-interpreting its characters first. I had to contact the original sender for the information.

Everyone concerned thought it was their fault, but it's not. It's due to a poorly-implemented email client. Perhaps the programmers had such limited time that they couldn't properly learn and implement the standard, so they only implemented just enough of it to make it work in their test cases. After all, Cambridge University's "WebMail" system is hard-coded to use the ISO-8859-1 character set (because the authors didn't have the resources to support other character sets and still make sure the program is secure against cross-site scripting attacks and so forth); at least this limitation is well-documented and the character set is identified in the headers of outgoing mail. But there is definitely room for improvement, especially if you are an establishment that has a policy of encouraging exchange with a certain foreign country and your staff members can't read emails that are sent from that country.

More generally, there is obviously scope for a greater promotion of good programming practice. The free software movement tends to get it right, because problems like these are fixed by the public before the program becomes popular. But many establishments prefer to write or buy proprietary software with insufficient support for the standards. That may be their loss, but it's everyone's loss if it makes it more difficult to communicate with people just because they happen to be in an establishment that uses broken software. More people need to be aware of this, especially if they're interested in promoting international exchange. The general standard of programming practice still has much room for improvement.

Notes: 

More fields may be available via dynamicdata ..