Re: Last Call: UTF-16

From: Frank da Cruz (
Date: Tue Aug 17 1999 - 13:57:31 EDT

> On Sun, Aug 15, 1999 at 01:02:24PM -0700, Frank da Cruz wrote:
> > > I believe they are on the Unicode CD. All those entities should register
> > > their coded character sets in the ISO-IR, though.
> >
> > Why? We don't need hundreds of redundant character sets in the IR. There
> > is no point in having character set standards if all character sets
> > automatically quality. In any case, most of these sets won't qualify
> > since they use their C1 areas for graphics.
> Please note that there are now two ISO International Registries on
> character sets. One is the ISO/IEC 2375 registry, the other is the
> ISO/IEC 15897 registry (the cultural registry found at
> -also known as the CEN ENV 12005 registry)
> The cultural registry is actually more appropiate for Internet use
> as it has been designed for it, and also that Internet IAB
> recommendations and RFC 2130 recommends as a policy to use the
> ISO/IEC 15897 registry. This ISO cultural registry contains charset
> descriptions that are alligned with IANA registrations and in
> a standardized format (ISO POSIX charmap) with mapping to ISO/IEC
> 10646. It also contains vendor charsets like the PC codepages,
> Mac codepages and EBCDIC encodings.
This still does not mean the Internet should endorse or promote the use
of private character sets, or even allow it.

Let us not be so quick to dismiss the International Register. It confers
numerous benefits:

 1. It maintains a certain consistency. Character sets are required
    to have a certain structure (or, more precisely, to fit into one
    of the predefined formats, or else, like the UCS, into a catch-all
    "other" format).

 2. Character sets registered in the IR come either from the ISO or
    from the standards body of a member nation, and therefore do not
    (in general) reflect the interests of a particular company or a
    the quirks of a particular architecture or operating system.
    Instead, they reflect (presumably) a consensus among parties with
    diverse and possibly conflicting interests, just the Internet itself
    must do.

 3. A unique registration number is assigned which allows the character
    set to be identified in a concise, unambiguous, and language-neutral

 4. An escape sequence is assigned to designate the character set in the
    ISO 2022 environment. This is an essential component of any character
    set to those of us concerned with terminal emulation.

 5. The character table is printed in the register so we may see the glyphs.

 6. The code assignments of each character are given.

 7. An official name is given to each character so we may identify it
    sufficiently to correlate it with instances in other character sets
    for mapping purposes.

 8. The registration is on line at:

(except the ISO 10646 code tables are not online or distributed in printed
format to register subscribers as far as I can tell).

- Frank

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT