Re: Last Call: UTF-16

From: Michael Everson (
Date: Wed Aug 18 1999 - 10:29:26 EDT

Ar 06:47 -0700 1999-08-18, scríobh Frank da Cruz:

>At the end of the tunnel we have Unicode / ISO 10646. So far there seems to
>have been surprisingly little politics involved in getting characters used
>in writing any language at all into it and this is a remarkable achievement.

The biggest problems have been ideological arguments over unification, I guess.

>But until the UCS has replaced everything else, we still need interchange
>standards based on existing character sets, and which body in the world
>should catalog and register these sets? The IANA? How will they choose
>among competing proposals? How will they exercise quality control? How
>will they document each character set? How will they resist pressure from
>giant corporations to bend the Internet this way or that?

I guess anyone who has a coded character set mappable to the UCS should be
able to have the mapping table recorded somewhere. Say there are three Mac
and 2 PC code tables ("fonts") for Cherokee. Now we also have Amendment 11.
I'd like a one-stop shop. Maybe this is something we'll feel an urgent need
for in future more than today; a transformation of the ISO-IR into
something else. I don't know.

>Eric speaks of alphabets needed for native American languages. What is the
>ideal situation? Should new single-byte character sets be created for them,
>which eventually will find their way into the UCS?

What a silly question. SIL has just created one for New Tai Lue. I have
done so for Inuktitut, Armenian, Georgian, Turkic Latin, Turkic Cyrillic,
Barents Cyrillic, Celtic Roman, Gaelic Roman, and Esperanto/Maltese, and I
do not intend to stop if there is a need on a particular platform for a
particular solution. I _do_ try to create my code sets responsibly, and
mappably to the UCS. I _don't_ sit around thinking up new code sets to
invent, however.

>Should we exert pressure
>on our national bodies to create national standards for these new sets?

This is troublesome and unnecessary. It certainly would exclude some people
from getting any reasonable support.

>as Ken advocates, should we "keep resisting the addition of more 8-bit
>character encodings that add to the legacy problem and that add to the
>registry messes." In that case, the problem involves only existing sets,
>and then I think the IR serves a useful function.

Ken doesn't have to type Kildin Sámi or Esperanto or Tatar on the Mac. Ken
works for industry, which admittedly has a lot of trouble implementing
Unicode reliably, because it is a huge and wonderful and complex standard.
But the rest of the world CANNOT sit by and wait for the gods to give us
Unicode. We have to type and print and share data. We have to write
languages other than English, French, German, and Japanese.

I think everybody knows this. Ken knows he wants to minimize legacy
problems. People like me know we have to minimize migration problems. There
are people making code sets who don't give a damn about either, and those
form the body of crap code sets that are the problem.

Michael Everson * Everson Gunn Teoranta *
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT