RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Tue May 29 2001 - 18:02:28 EDT

Next message: Jungshik Shin: "RE: Term Asian is not used properly on Computers and NET"
Previous message: Jungshik Shin: "RE: Term Asian is not used properly on Computers and NET"
In reply to: Kenneth Whistler: "Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)"
Next in thread: Kenneth Whistler: "RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ken,

UTF-8s is essentially a way to ignore surrogate processing. It allows a
company to encode UTF-16 with UCS-2 logic.

The problem is that by not implementing surrogate support you can introduce
subtle errors. For example it is common to break buffers apart into
segments. These segments may be reconcatinated but they may be processed
individually.

For example if you break a buffer apart and translate each segment to a code
page what happens when you break a non-plane 0 character into two UTF-8s
encoded surrogates and break the buffer segment between the two surrogates.
You buffer will contain an incomplete character.

The applications will find intermittent translations failures that may be
very hard to isolate.

The code to insure that the two UTF-8s surrogate characters stay together
could be more than it takes to process UTF-16. It is certainly harder to
Q.A. such software.

Carl

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Kenneth Whistler
Sent: Tuesday, May 29, 2001 11:18 AM
To: DougEwell2@cs.com
Cc: unicode@unicode.org; kenw@sybase.com
Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)

Doug wrote:

> UTF-8 and UTF-32 should absolutely not be similarly hacked to maintain
some
> sort of bizarre "compatibility" with the binary sorting order of UTF-16.

> UTC should not, and almost certainly will not, endorse such a proposal on
the
> part of the database vendors.

I would be loath to take such a sanguine attitude, however.

The UTC first took up this issue formally on April 29, 2000
(document L2/00-139R) when the UTF in question was referred to
as "UTF-8-16". At that time there was some discussion, a great
deal of it opposed to the introduction of another UTF. At that
time, the Peoplesoft representative was tasked to go off an
"summarize the database issues" that underlay the proposal.

After much delay, the issue resurfaced in this last UTC meeting,
as UTF-8S, with some of the concerns addressed and more background
presented about the database performance issues that have been
driving the proposal.

*This* time the reception was not as hostile as a year ago, with
something like a 50/50 split in the committee, and with claims
forwarded in committee that "there is near-universal agreement
among the database vendors", with the noted exception of NCR.
There was a consensus to take no action now, and the Oracle and
Peoplesoft representatives were tasked to make further revisions
and perhaps bring in database specialists to discuss the implementation
issues.

The point is that while the UTC did not endorse this proposal as
of May 23, 2001, the pressure to create a UTF-8S is rising, and there
is no guarantee that the UTC will not sway to such support in
the future, despite the logic of the arguments presented against
UTF-8S.

--Ken Whistler

>
> -Doug Ewell
> Fullerton, California
>
>

Next message: Jungshik Shin: "RE: Term Asian is not used properly on Computers and NET"
Previous message: Jungshik Shin: "RE: Term Asian is not used properly on Computers and NET"
In reply to: Kenneth Whistler: "Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)"
Next in thread: Kenneth Whistler: "RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT