RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

From: B (11@onna.com)
Date: Tue May 29 2001 - 20:26:42 EDT


You can just say Screw the number 8, let's use 21-bit bytes.

$B!z$8$e$&$$$C$A$c$s!z(B

EKYWY TXLY NPZ P MPVD XPHYV LPWWQY
NKT ZPN XT WYPZTX PE PMM ET HPWWD
"EYX EKTSZPXV'Z HTWY GSX
P XSHOYW EKPX TXY
PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD"

--- Original Message ---
$B:9=P?M(B: "Carl W. Brown" <cbrown@xnetinc.com>;
$B08@h(B: unicode@unicode.org;
Cc:
$BF|;~(B: 01/05/30 0:46
$B7oL>(B: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

>Ken,
>
>I suspect that Oracle is specifically pushing for this standard because of
>its unique data base design. In a sense Oracle almost picks it self up by
>its own bootstraps. It has always tried to minimize actual code. Therefore
>it was a natural choice to implement Unicode with UTF-8 because it is easy
>to reuse the multibyte support with minor changes to handle a different
>character length algorithm. This has been one of the reasons that Oracle
>has been successful. Its tinker toy like design has enabled them to quickly
>adapt and add new features. Now however, they should take the time do "do
>it right". Its UTF-8 storage creates problems for database designers
>because they can not predict field sizes. This is a problem with MBCS code
>pages but UTF-8s will make it worse. There will be lots of wasted storage
>when characters can vary in size from 1 to 6 bytes.
>
>Most other database systems require specific code to support Unicode. As a
>consequence most have implemented using UCS-2. Their migration is obviously
>to use UTF-16. UTF-8s buys them nothing but headaches.
>
>Carl
>
>-----Original Message-----
>From: Kenneth Whistler [mailto:kenw@sybase.com]
>Sent: Tuesday, May 29, 2001 3:47 PM
>To: cbrown@xnetinc.com
>Cc: unicode@unicode.org; kenw@sybase.com
>Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
>email)
>
>
>Carl,
>
>> Ken,
>>
>> UTF-8s is essentially a way to ignore surrogate processing. It allows a
>> company to encode UTF-16 with UCS-2 logic.
>>
>> The problem is that by not implementing surrogate support you can
>introduce
>> subtle errors. For example it is common to break buffers apart into
>> segments. These segments may be reconcatinated but they may be processed
>> individually.
>
>You are preaching to the choir here. I didn't state that *I* was in
>favor of UTF-8S -- only that we have to be careful not to assume that
>UTC will obviously not support it. The proponents of UTF-8S are
>vigorously and actively campaigning for their proposal. In
>standardization committees, proposals that have committed, active
>proponents who can aim for the long haul, often have a way of getting
>adopted in one form or another, unless there are equally committed
>and active opponents of the proposal. It is just the nature of
>consensus politicking in these committees, whether corporate based
>or national body based.
>
>Also, I consider the stated position of "near-universal agreement
>among the database vendors" to be largely a rhetorical device by
>the proponents. Oracle is clearly pushing the proposal. NCR has
>stated it is not in favor of the proposal. The other big enterprise
>database vendors are hedging their positions somewhat -- in
>particular, the standards people in those companies may not be
>entirely in agreement with some of their database engine developers, for
>example. And the small database vendors are either not playing
>in this space or are part of desktop systems that will just follow
>the behavior of the platforms.
>
>--Ken
>
>
>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT