Re: UTF-8 reg tags...

From: Glenn Adams (glenn@spyglass.com)
Date: Mon Sep 16 1996 - 09:22:59 EDT


At 11:05 PM 9/11/96, Francois Yergeau wrote:
>À 09:44 11-09-96 -0400, Glenn Adams a écrit :
>>I believe that moving ahead with a bare "UTF-8" is an extremely
>>bad idea. To stay consistent with previous tags, you should
>>have specified:
>>
>> UNICODE-1-1-UTF-8
>>
>> or
>>
>> ISO-10646-UTF-8 (this is ambiguous regarding repertoire)
>>
>>As the author of the I-D, you can revoke it prior to it being published
>>as an RFC. I would request that you do so immediately before it goes
>>out.
>
>Before I do that, I would like to see a little more argumentation, beyond
>the naked statement that it is "an extremely bad idea". The consistency
>argument seems pretty weak to me, in the face of a consensus for "UTF-8"
>alone reached after a quite lengthy discussion on the ISO10646 list,
>involving a number of knowledgeable people .

(1) the designation "UTF-8" does not designate a coded character set (i.e.,
a repertoire and its code set); rather it designates only a transformation
method that could potentially be used with arbitrary CCSs. For the purposes
of MIME it is essential to designate both the character encoding and the
coded character set so encoded.

(2) the designation "UTF-8" is inconsistent with currently specified
designations. See my previous message.

>Has the UTC considered the question for more than a few minutes, or has it
>simply found it expedient to follow the pattern?

We discussed it in depth, not cursorily.

>Has a case been made for
>the need of a version number in the MIME tag(s)?

If it weren't for the incompatible changes between UC 1.0 and UC 1.1 (removal
of Tibetan and reassignment of other chars to obtain the merger with 10646)
and the changes between UC 1.1 and UC 2.0 (reassignment of Korean Hangul), then
version designation would not be a significant issue. The UTC will *strongly*
oppose any further incompatible changes; however, we have to deal with history
as it stands.

>Is it based on the change
>to the Korean encoding? If so, does the UTC consider this incompatible
>change to be a serious problem in practice?

Yes, the UTC considers UC 2.0 to be an incompatible change to UC 1.1; thus
the need for a version designation.

Another issue which you haven't addressed with a simple "UTF-8" designation
is the distinction between Unicode and 10646. Unicode entails (and requires
certain semantics) that are 10646 does not. Thus it is essential to distinguish
among a UTF-8 encoding of Unicode and one of 10646.

Glenn



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT