Markus Scherer wrote:
> John O'Conner wrote:
> > It appears that this new product is not adopting UTF-32...and is
> > sticking with UTF-16 (or more appropriately UCS-2?).
Not very surprising given the commitment of MS with 16-bit Unicode.
> > APIs use and return single 16-bit values.
Ah, that may be a problem (what is the ToUpper return value of ß?)
> > This certainly doesn't make surrogate-pair values easy to use.
> > What influence, if any, does this have on the adoption of
> > UTF-32 or even UTF-16 using surrogate pairs?
I believe as much as Java...
> many other apis and libraries from ms - like uniscribe - support utf-16,
> though. ie 5.1 (exact number?) displays a surrogate pair as one single
> box instead of as two, for example.
Sorry, I beliee you're micing things; that is not a IE feature, it depends
on the platform, I believe.
I was just doing the test with IE5.01 on a 98 box, and there was distinctly
two empty boxes for each (>= U+10000) character! On the other hand,
Windows 2000 is known to have an (embryonary, but far sufficient nowadays)
support for the surrogate pairs.
> utf-32 is interesting only when fixed-width processing is absolutely necessary.
It is also interesting with some (not Intel-based) platforms, where performance
is better with 32-bit units than with 16-bit.
> the design of the c stdlib assumes that wchar_t strings are fixed-width,
Yes (although I believe use of UTF-16 rather than UCS-2 might be conformant).
> therefore they are migrating to utf-32 regardless of wasting space.
Huh ? What can lead to that conclusion ?
We can perfectly make a conforming C stdlib with 8-bit wchar_t.
And of course, nothing prevents to use not-Unicode 16-bit wchar_t (and in
particular East-Asian encodings), as wchar_t was precisely set up for this
use in the first place.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT