Re: Communicator Unicode

From: Martin J. Dürst (mduerst@ifi.unizh.ch)
Date: Fri Sep 19 1997 - 05:25:17 EDT


On Tue, 16 Sep 1997, John Gardiner Myers wrote:

> > It will be very helpful to have different tags for the character set itself
> > and the transformation format. Otherwise you end up in a product
> > (ISO-10646 version) * (transformation-format) of different character sets
> > to be registered -- clearly an uneconomical approach.
>
> MIME quite deliberately ignored the concept of a "coded character set"
> absent a "character encoding scheme" as not being useful for
> interchange.

Yes indeed. If you look at the actually existing combinations of
"coded character set" and "character encoding scheme", you will
see that it is a rather sparsely populated matrix, with lots of
special cases. Therefore, it is much better to identify the
individual combinations than each of the two concepts.

> Current thinking is that labeling the ISO 10646 version is harmful.
> Including it greatly increases the chance that something will be
> unreadable because the reader does not recognize the label. If Unicode
> and ISO 10646 do their jobs properly, then the version information is
> not useful.

Agreed. Unicode/ISO 10646 once made a big mistake, when they relocated
Korean Hangul Syllables. That's why it can make sense, in very rare
occasions, to have something like Unicode-1-1-UTF-7 or so. UTF-7 and
UTF-8 should always refer to the laters version, and because a lot
of people have learned a lot from the Korean mess (as I call it),
and Unicode 2.0/ISO 10646 including Amendment 5 is really now in wide
use, there will only be additions, and no changes, in future versions.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT