RE: UTF-16 problems

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Jun 11 2001 - 18:43:42 EDT


Michka,

I first I thought the same thing but I have changed my mind. There are
problems but the problems are with UTF-16 not UTF-8. I don't think that I
am the only one who thinks that UTF-8s will create more problems that it
fixes.

Worse yet they will also have to "fix" UTF-32 as well.

The point of this message is to fix UTF-16 which is the source of the
problem. These changes are no more of a stretch than UTF-32s. The UTF-32s
proposal that I heard involves replication the same code points to get these
code points to sort high like UTF-16.

What this does, is the legitimize the code point shift for UTF-16, UTF-8,
and UTF-32 so that the transforms all work and all sort the same and that
the binary sort and Unicode sort orders are the same.

It does involve a minor normalization transform but you have to do that for
UTF-32s anyway and UTF-32s is required if you allow support of UTF-8s. The
big difference is that you don't change any UTF protocols or develop two
mutually exclusive transforms that are so similar that they might be
confused. Besides this transform keeps UTF-8 to 4 bytes not 6 and will work
with the existing UTF-8 software.

The beauty of this proposal is that UCS-2 (plane 0 only) codes will sort in
the same order as the post transformed UTF-16 codes.

Carl

-----Original Message-----
From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
Sent: Monday, June 11, 2001 1:22 PM
To: Carl W. Brown; unicode
Subject: Re: UTF-16 problems

From: "Carl W. Brown" <cbrown@xnetinc.com>

> I think that UTF-16x would be a better approach than UTF-8s. I am sure
that
> I have missed some issues feel free to comment. In any case UTF-16s would
> naturally be in Unicode code point order. It would be easy to transform
to
> UCS-2 for applications that do not support UTF-16.

Carl, you are missing the central point of the UTF-8S movement -- they do
not want to change anything. Hell, they do not even want to change the
*name* they are so disinterested in changing anything! They want the Unicode
standard to embrace their format and support their bug, and not change a
bleeding thing.

They are distorting the truth (companies who only care about the whole mess
for the sake of compatibility with Oracle are being quoted as being
"intensely supportive of UTF-8S", and I'm sorry but distortion is the only
word for it). Revisionist history and revisionist present/future at its
finest, all you need is suspension is diebelief and you can vote for UTF-8S
knowing that you are saving the standard from oblivion!

Where are all these conspiracy buffs when you need them? They can have a
field day with this little adventure we have been having.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT