From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Nov 14 2002 - 12:18:16 EST
Carl W. Brown wrote:
> Some Unix systems adapted faster because the later Unicode adopters used 32
> bit Unicode characters making the job 100 times easier. Other companies
> like Microsoft took a very big gamble and implemented the code for surrogate
> support into Windows 2000 based on early drafts of the Unicode standard. If
> they had not done it this way or had guessed wrong they might not even have
> support in Windows XP.
Hi Carl, I am not going to argue with you on what you say about ICU :-) but I am not sure about your
Unix comments.
First, AIX 5 uses 32-bit wchar_t, which is UTF-32 except for the zh_TW locale, as far as I know.
(AIX 5 zh_TW uses a different wchar_t encoding.)
Again as far as I know, Unix/Linux systems chose to use 32-bit wchar_t not because of great
strategic plans or compelling performance analysis, but because the existing C stdlib functions for
wchar_t string handling assume that the single-code-point type is the same as the string base unit.
This one design point requires 32-bit wchar_t not just for Unicode but also for the character sets
of EUC-TW and GB18030.
You seem to suggest that there is a problem with 16-bit Unicode. It does take some effort to adapt
UCS-2-designed functions for UTF-16, but it's not "rocket science" and works very well thanks to the
Unicode allocation practice (common characters in the BMP). Making UTF-8/32 functions work with
supplementary code points when they had assumed BMP-only operation probably took some work too.
In fact, on Unix/Linux systems you find not only UTF-32 via wchar_t, but also UTF-8 (low-level tools
and gnome) and UTF-16 (ICU, KDE/Qt, and many applications like Mozilla and OpenOffice).
Best regards,
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 13:12:29 EST