Re: IBM AIX 5 and GB18030

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Nov 14 2002 - 12:18:16 EST

Next message: Markus Scherer: "Re: IBM AIX 5 and GB18030"

Previous message: John H. Jenkins: "Re: The result of the plane 14 tag characters review."
In reply to: Carl W. Brown: "RE: IBM AIX 5 and GB18030"
Next in thread: Carl W. Brown: "RE: IBM AIX 5 and GB18030"
Reply: Carl W. Brown: "RE: IBM AIX 5 and GB18030"
Reply: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Carl W. Brown wrote:
> Some Unix systems adapted faster because the later Unicode adopters used 32
> bit Unicode characters making the job 100 times easier. Other companies
> like Microsoft took a very big gamble and implemented the code for surrogate
> support into Windows 2000 based on early drafts of the Unicode standard. If
> they had not done it this way or had guessed wrong they might not even have
> support in Windows XP.

Hi Carl, I am not going to argue with you on what you say about ICU :-) but I am not sure about your
Unix comments.

First, AIX 5 uses 32-bit wchar_t, which is UTF-32 except for the zh_TW locale, as far as I know.
(AIX 5 zh_TW uses a different wchar_t encoding.)

Again as far as I know, Unix/Linux systems chose to use 32-bit wchar_t not because of great
strategic plans or compelling performance analysis, but because the existing C stdlib functions for
wchar_t string handling assume that the single-code-point type is the same as the string base unit.
This one design point requires 32-bit wchar_t not just for Unicode but also for the character sets
of EUC-TW and GB18030.

You seem to suggest that there is a problem with 16-bit Unicode. It does take some effort to adapt
UCS-2-designed functions for UTF-16, but it's not "rocket science" and works very well thanks to the
Unicode allocation practice (common characters in the BMP). Making UTF-8/32 functions work with
supplementary code points when they had assumed BMP-only operation probably took some work too.

In fact, on Unix/Linux systems you find not only UTF-32 via wchar_t, but also UTF-8 (low-level tools
and gnome) and UTF-16 (ICU, KDE/Qt, and many applications like Mozilla and OpenOffice).

Best regards,
markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Next message: Markus Scherer: "Re: IBM AIX 5 and GB18030"
Previous message: John H. Jenkins: "Re: The result of the plane 14 tag characters review."
In reply to: Carl W. Brown: "RE: IBM AIX 5 and GB18030"
Next in thread: Carl W. Brown: "RE: IBM AIX 5 and GB18030"
Reply: Carl W. Brown: "RE: IBM AIX 5 and GB18030"
Reply: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 13:12:29 EST