Re: Last Call: UTF-16, an encoding of ISO 10646 to Informational

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Sun Aug 15 1999 - 11:06:35 EDT


Eric Brunner <brunner@maine.rr.com> wrote:

> Frank offers a brief comment...
>
> >Internet standards have to do with what goes on the wire. Where character
> >sets are concerned, Internet standards should recognize only international
> >standard character sets, namely those registered in the ISO International
> >Register of Coded Character Sets, as is UTF-16. So far so good.
>
> I disagree. Not with the wire format part of his observation.
>
That note was dashed off in haste. The idea I was trying to get across is
that, when a particular writing system may be represented by more than one
character set, and one of them is an international standard, and the others
are not, there is no reason or justification for the IETF to recognize the
nonstandard ones, nor does it serve any useful purpose. For example, if
Latin-2 is registered for use on the Internet, there is no reason to also
register PC Code Page 852 and/or 1250.

If Old High Blackfoot has not been registered by the ISO, then it should be,
assuming there is consensus on its form and content. In the meantime, if
there is (say) a Canadian or Blackfoot national standard for it, then it can
be used in the interim. The broader the scope of a registration authority,
the more it is to be preferred since, presumably, it represents a broader
concensus. However, this view ignores political questions and so is not
entirely satisfactory, but politics are everywhere. (This view also raises
the question of multiple registration authorities, which in turn suggests a
need for a registration authority for registration authorities.)

But this is a tangent. My point was: the Internet should not be registering
and blessing every corporate character set (e.g. PC code page) that comes
along, and in this case (UTF-16), it should not be registering every
corporate variation (e.g. Intel vs Sun byte order), because this serves no
useful purpose. Specify one and only form for UTF-16 on the wire and be
done with it instead of "some people do it this way but others like to do it
that way.." That's not a standard.

There are two ends of a network connection. If the software at each end has
to understand every wacky character set that was ever invented, and every
conceivable byte order for multibyte character sets, it would be
unnecessarily complex and ungainly, and nobody would bother to support the
character sets (or byte orders) they didn't care about, so I think this
approach inhibits rather than fosters open communication. (A registration
authority that registers every character set is like a patent office that
approves every patent application -- there is also supposed to be a search
for prior art to prevent duplication.)

If, on the other hand, only a small set of standard character sets are
allowed on the wire (sufficient, obviously, to represent all desired writing
systems), then each application on each end system only needs to know the
standard ones in addition to its own local ones.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT