Re: Last Call: UTF-16

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Sun Aug 15 1999 - 16:07:57 EDT


> >Right, but then we need new and better standards like Unicode. We do NOT
> >need to send PC code pages through the Internet, because not everybody has
> >a PC, and for that matter, not every PC uses the same code pages.
>
> Look, I can't send anything but 8-bit code pages through the internet
> because I use Mac OS 8.5 which has no Unicode support.
>
That doesn't mean the IETF should start registering Macintosh code pages as
standard charsets. It means that whoever creates a private code page should
provide an official mapping to a well-defined international standard
character set that contains the same repertoire.

It is the responsibility of any software that communicates over a network
to translate between its own local formats and codes to the standard ones.

If the standard ones are inadequate, the standards need fixing. And that's
what we've been doing here with Unicode these past years.

In the meantime, of course every company tries to get a leg up on the others
by getting us hooked on items that are not standardized, like em-dashes and
"fl" ligatures. That does not mean we abandon the standards process and turn
the Internet into the Wild West (although I must say to a large extent that's
exactly what has happened).

Instead, it means we (a) do our best to accelerate the standards process,
(b) offer the greatest possible incentives for vendors to follow standards,
and (c) create disincentives for them to ignore standards. If vendors don't
like the standards, they should participate in the process, as they did in
the past.

> So, Frank, I need to use MIME, and I need to use filters
> that can convert from one character set to another.
>
That's the end user talking, not the standards guy :-)

Certainly tagging is better than not tagging. But if you are referencing
nonstandard encodings in your tags (e.g. charsets not in the IR), then you
(as a standards activist) will have the good sense to make sure that your
communication partners support these tags. But that's you, not the average
mass-market consumer who just got a free PC for signing up with an ISP.

MIME offers no assurance whatever that the receiver can understand what
is sent to it. This is the OPPOSITE of good network design.

Furthermore, since MIME is used primarily in email, there is no mechanism for
negotiation -- it's take it or leave it. By the time you get my mail, I'm
gone. If you can't read it, tough.

> If all data got
> converted into UTF-8 to go to the net that would be fine, but it will take
> years and years for that to happen. We're going to need tagged mail for
> ages to come. Software developers of internet applications need to make
> sure that the flexibility is there to the end user to add new codings if
> necessary.
>
But that leaves us in a pretty bad spot. It's fine for "consenting adults",
but the effect is chilling for anybody who is not an expert and who wants
freedom of choice in computer platforms and applications.

Anyway, it's moot since it already happened. However, let's do our best
not to perpetuate this approach. Now that we have something to say about
how UTF-16 is to be formatted and announced on the Internet, let's see if we
can achieve a sensible result. It's not a big deal: if your UTF-16 is LE,
then please swap the bytes before putting them on the wire.

> ... PC code pages, NeXTSTEP, Data General, Hewlett
> >Packard, EBCDIC, and every other conceivable encoding. Where would I even
> >find the specifications?
>
> I believe they are on the Unicode CD. All those entities should register
> their coded character sets in the ISO-IR, though.
>
Why? We don't need hundreds of redundant character sets in the IR. There is
no point in having character set standards if all character sets automatically
quality. In any case, most of these sets won't qualify since they use their
C1 areas for graphics.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT