Date: Tue May 01 2001 - 11:23:11 EDT


We need to get more e-mail software upgraded for UTF-8 support. It is great between users who have UTF-8 support but a lot of code out there does not support it yet.

The problem with UTF-8 support is that many have tried to add it to their existing codepage based products. Thus for example if I use a Japanese reader with UTF-8 it will convert the UTF-8 into Japanese code pages. This is OK unless you get a character that does not translate. These also use non-Unicode fonts so that the character sets overlap. It is like using UTF-8 with a Netscape 4.x browser. It just doesn't work well. Vendors have to do what Netscape did namely rewrite for Unicode support or like MS write it for Unicode to begin with. It will take a while for all these mail provides to convert to Unicode.

The second problem is script detection and font selection. No single font can cover all Unicode characters. The application must be able to detect what font to use for each character. This is not a simple job. You also have to have lots of fonts installed on the system.


Mike Ayers
Sent: Monday, April 30, 2001 1:21 PM
Subject: UTF-8 on this list

        Long after upgrading to Win2K, setting up all my fonts, and testing
everything, I've come to a conclusion: there are darn few Unicode text
messages on the Unicode mail list (i.e. characters are referred to by
codepoint, but the character itself is never included). In fact, I think
I've seen more HTML messages than Unicode messages. Also, I've never seen
the issue raised. Is it considered wrong to send Unicode messages (instead
of using U+xxxx notation), or do few people have the proper setup (or are we
just being considerate to those who don't)? Have we even thought about
this? I would think that this list should be one of the first places to see
regular usage, but that doesn't seem to be the case.

        Enlightenment, anyone?

