From: Doug Ewell (email@example.com)
Date: Thu Nov 14 2002 - 23:26:20 EST
Carl W. Brown <cbrown at xnetinc dot com> wrote:
> Converting from UCS-2 to UTF-16 is just like converting from SBCS to
> DBCS. For folks who think DBCS it is no problem. Those who went from
> DBCS to Unicode to simplify their lives I am sure are not happy.
Ken made me laugh last March by referring to this as
"... a bait and switch tactic, whereby implementers were lulled
into thinking they had a simple, fixed-width 16-bit system, only
to discover belatedly that they had bought into yet another
mixed-width character encoding after all."
At least with surrogate pairs, we don't have to deal with overlapping
ranges for lead bytes and trail bytes, or for trail bytes and
single-byte characters, and we don't have to go through crazy gymnastics
to "find the last lead byte" if we ever get lost in the middle of a
> I think that worst problem is that many systems still sort in binary
> not code point order. Then you get Oracle and the like wanting to set
> up a UTF-8 variant that encode each surrogate rather than the
As Michka noted, the mechanism for surrogates has existed for almost a
decade now. Individuals and companies that ignored surrogates because
"there aren't any characters there anyway, and when they do add some
they'll be extremely rare," and are now behind in supporting UTF-16,
really have nobody else to blame.
> However, 16 bit characters were a hard enough sell in the good old
> days. If we had started out withug 2bit characters we would still be
> dreaming about Unicode.
I think Carl meant "with 32-bit characters." I don't know what kind of
word "withug" is (Old English?), but I like it.
This archive was generated by hypermail 2.1.5 : Fri Nov 15 2002 - 00:14:28 EST