Re: Last Call: UTF-16, an encoding of ISO 10646 to Informational

From: Eric Brunner (
Date: Fri Aug 13 1999 - 19:29:14 EDT

Frank offers a brief comment...

>Internet standards have to do with what goes on the wire. Where character
>sets are concerned, Internet standards should recognize only international
>standard character sets, namely those registered in the ISO International
>Register of Coded Character Sets, as is UTF-16. So far so good.

I disagree. Not with the wire format part of his observation.

In the first instance, I view our work in this context as supporting the use
of writing systems, hence code sets, not providing a wire format for arbitrary

In the second instance, it is really quite recent that the character sets
used by a large plurality of native languages in use in Canada have been
actively promoted by that state in the international standards system (now
at pDAM if memory serves). The same or worse exists w.r.t. the character
sets of romanized writing systems in the US and Canada. Generally, it is a
reasonable observation that the ISO system is out of reach of indigenous
polities, hence unlikely that their linguistic concerns have actually been
met within the code set space.

If anyone knows what the ISO registration for Old High Blackfeet, romanized
or syllabic, or for diacritically simplified modern Siksika, let me know.
I'll be appropriately contrite. Meanwhile, can we please assume that the
universalist pretentions of the ISO system are still culturally limited and
likely to remain so for some time and that this applies to registration of
minority language character sets, even if the best of intentions are actually

>But there is no mention of byte order in the ISO registrations for UTF-16;
>instead it is registered according to Level (1, 2, or 3). Internal
>representation on machines of different archictures is, and should be,
>irrelevant to character-set standardization.
>I would suggest, therefore, that the IETF follow the categorizations in
>the ISO Register, where we have a ready-made catalog of coded character
>sets, along with a unique and unambiguous way in which to refer to them:
>the ISO registration number.
>(Of course I made the same suggestion many times in the past, and yet
>Internet standards are chock full of bizarre nonstandard and proprietary
>character sets and encodings that have no business in standard
>vendor-neutral protocols, which gives rise to applications that feel it is
>perfectly ok to (say) send e-mail in (say) Code Page 1251 and expect the
>recipient to be able to read it, as long as the "charset" is announced.
>But I digress.)

Vendor neutrality is sort of good, but SJIS is not going to go away any
time soon. If (humor me) one of apple/windoz/unix really supported any
endangered language (and arguably cases can be made for two of three), a
desire for "vendor neutrality" which resulted in erosion of an endangered
language would be a very high price to pay. Some markets are hostOS niches.

Generous in what you accept ... OK, there exist dorky apps that pretend to
be email (outlook comes to mind, but I digress), and ship the odd code page.

>The IETF is in a position to legislate what flies around on the Internet
>wires, and should exercise its power in this case to mandate UTF-16 in one
>and only one form rather than all possible forms including "guess". Network
>protocols work as intended when the agents at each end of a connection
>convert between their own local format and the well-defined standard one on
>the wire. Let's take this opportunity to avoid yet another imponderable.

I hardly disagree with the desire for UTF 16, but not with the desire to dump
a hard problem on a body that may not care to solve it. ISO has a registry,
so do we. Lets keep it that way.

Kitakitamatsinopowaw ("I'll see you again", diacritically simplified modern
                          Siksika, romanized.)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT