Re: C # character model

From: Markus Scherer (
Date: Tue Jun 27 2000 - 18:31:30 EDT

John O'Conner wrote:
> It appears that this new product is not adopting UTF-32...and is
> sticking with UTF-16 (or more appropriately UCS-2?). APIs use and return
> single 16-bit values. This certainly doesn't make surrogate-pair values
> easy to use. What influence, if any, does this have on the adoption of
> UTF-32 or even UTF-16 using surrogate pairs?

many other apis and libraries from ms - like uniscribe - support utf-16, though. ie 5.1 (exact number?) displays a surrogate pair as one single box instead of as two, for example.

libraries and apis that use 16-bit unicode will need to do utf-16 for support of the additional characters especially in plane 2. without them, full support for japanese and hong kong and other legacy codepages and characters is not there.

utf-32 is interesting only when fixed-width processing is absolutely necessary. the design of the c stdlib assumes that wchar_t strings are fixed-width, therefore they are migrating to utf-32 regardless of wasting space.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT