Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

From: Simon Law (simon.law@oracle.com)
Date: Wed May 30 2001 - 14:01:44 EDT


Hi Folks,

Over the last few days, this email thread has generated many interesting
discussions on the proposal of UTF-8s. At the same time some speculations have
been generated on why Oracle is asking for this encoding form. I hope to clarify
some of these misinformation in this email.

In Oracle9i our next Database Release shipping this summer, we have introduced
support for two new Unicode character sets. One is 'AL16UTF16' which supports
the UTF-16 encoding and the other is 'AL32UTF8' which is the UTF-8 fully
compliant character set. Both of these conform to the Unicode standard, and
surrogate characters are stored strictly in 4 bytes. For more information on
Unicode support in Oracle9i , please check out the whitepaper "The power of
Globalization Technology" on
http://otn.oracle.com/products/oracle9i/content.html

The requests for UTF-8s came from many of our Packaged Applications customers
(such as Peoplesoft , SAP etc.), the ordering of the binary sort is an important
requirement for these Oracle customers. We are supporting them and we hope to
turn this into a TR such that UTF-8s can be referenced by other vendors when
they need to have compatible binary order for UTF-16 and UTF-8 across different
platforms.

The speculation that we are pushing for UTF-8s because we are trying to minimize
our code change for supporting surrogates, or because of our unique database
design are totally false. Oracle has a fully internationalized extensible
architecture and have introduced surrogate support in Oracle9i. In fact we are
probably the first database vendor to support both the UTF-16 and UTF-8 encoding
forms, we will continue to support them and conform to future enhancements to
the Unicode Standard.

Regards

Simon

"Carl W. Brown" wrote:

> Ken,
>
> I suspect that Oracle is specifically pushing for this standard because of
> its unique data base design. In a sense Oracle almost picks it self up by
> its own bootstraps. It has always tried to minimize actual code. Therefore
> it was a natural choice to implement Unicode with UTF-8 because it is easy
> to reuse the multibyte support with minor changes to handle a different
> character length algorithm. This has been one of the reasons that Oracle
> has been successful. Its tinker toy like design has enabled them to quickly
> adapt and add new features. Now however, they should take the time do "do
> it right". Its UTF-8 storage creates problems for database designers
> because they can not predict field sizes. This is a problem with MBCS code
> pages but UTF-8s will make it worse. There will be lots of wasted storage
> when characters can vary in size from 1 to 6 bytes.
>
> Most other database systems require specific code to support Unicode. As a
> consequence most have implemented using UCS-2. Their migration is obviously
> to use UTF-16. UTF-8s buys them nothing but headaches.
>
> Carl
>
> -----Original Message-----
> From: Kenneth Whistler [mailto:kenw@sybase.com]
> Sent: Tuesday, May 29, 2001 3:47 PM
> To: cbrown@xnetinc.com
> Cc: unicode@unicode.org; kenw@sybase.com
> Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
> email)
>
> Carl,
>
> > Ken,
> >
> > UTF-8s is essentially a way to ignore surrogate processing. It allows a
> > company to encode UTF-16 with UCS-2 logic.
> >
> > The problem is that by not implementing surrogate support you can
> introduce
> > subtle errors. For example it is common to break buffers apart into
> > segments. These segments may be reconcatinated but they may be processed
> > individually.
>
> You are preaching to the choir here. I didn't state that *I* was in
> favor of UTF-8S -- only that we have to be careful not to assume that
> UTC will obviously not support it. The proponents of UTF-8S are
> vigorously and actively campaigning for their proposal. In
> standardization committees, proposals that have committed, active
> proponents who can aim for the long haul, often have a way of getting
> adopted in one form or another, unless there are equally committed
> and active opponents of the proposal. It is just the nature of
> consensus politicking in these committees, whether corporate based
> or national body based.
>
> Also, I consider the stated position of "near-universal agreement
> among the database vendors" to be largely a rhetorical device by
> the proponents. Oracle is clearly pushing the proposal. NCR has
> stated it is not in favor of the proposal. The other big enterprise
> database vendors are hedging their positions somewhat -- in
> particular, the standards people in those companies may not be
> entirely in agreement with some of their database engine developers, for
> example. And the small database vendors are either not playing
> in this space or are part of desktop systems that will just follow
> the behavior of the platforms.
>
> --Ken





This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT