Re: Last Call: UTF-16

From: Michael Everson (
Date: Wed Aug 18 1999 - 08:40:11 EDT

Ar 22:26 -0700 1999-08-17, scríobh Eric Brunner:

>Formal access to the ISO system is limited to polities recognized as
>states and a limited range of other territorial catagories. I'm not
>aware of a single instance of an indigenous nation in the Americas
>obtaining this status within the UN system, and I'm not offering a
>casual observation here.

Only partially true. Formal _membership_ may be so limited. (States have to
pay dues to ISO to belong, so lots of states don't belong.) But _access_ to
the system is not limited. Access is easiest via the internet.

>With the recent exception of the Unified Canadian Aboriginal Syllabary,
>the standards bodies of the member states in the Americas have shown
>no interest in the technical requirements for the indigenous languages
>of the Americas.

This is untrue.

* Amendment 11 (Unified Canadian Aboriginal Syllabics) was published on
1998-07-15. The draft was developed primarily by Canada, Ireland, and the
United Kingdom.

* Amendment 12 (Unified Canadian Aboriginal Syllabics) was published on
1998-09-01. The draft was developed primarily by Ireland and the United
States. I prepared the initial proposal, and Lisa Moore of IBM worked with
members of the Cherokee Nation to gain their approval.

* Amendment 30 (Additional Latin and other characters) is in its FPDAM
ballot. It contains U+0222 LATIN CAPITAL LETTER OU and U+0223 LATIN SMALL
LETTER OU, which are the omicron-upsilon ligatures used in Algonquin and
other languages. Canada and Ireland sponsored these characters.

>A quarter million Navajos and no iso3:1 code!

Sorry, Eric, I can't speak to this because I'm not sure what you mean by
"iso3:1 code". Navajo is representable in the UCS using combining
characters. If a handful of precomposed characters were added it could be
represented without them. Actually, I have on several occasions discussed
Navajo encoding with Navajo users.

>There are lots of other "stateless" linguistic groups, the issue isn't
>an Indian monopoly by any means, and it is only partially mitigated by
>the well intentioned, often pro bono acts of a few individuals.

I can't deny that a lot of my own support for the UCS is pro bono.

>States are not too keen on minority languages ... which is why there
>are UN resolutions on the subject ... to improve a situation which
>requires systemic improvement.

True. However, in defence of the standardizers: a short list of the
"stateless" linguistic groups for whom work is actively proceeding to
encode scripts in the UCS:

Dehong Dai
Kayah Li
Lanna Tai
New Tai Lue
Ol Cemet'
Pahawh Hmong
Sorang Sanang
Varang Kshiti
Viêt Thái

>> and therefore do not
>> (in general) reflect the interests of a particular company or a
>> the quirks of a particular architecture or operating system.
>Here also "objective neutrality" chafes unexpectedly. Work outside of
>the small paid (and highly networked) i18n community frequently is at
>the resource level where working code but being OS dependent is OK,
>it beats the alternative, no code other than ASCII.

Certainly it is true that immediate solutions to typing and printing are
more important than text interchange. What we're concerned about is making
sure the marginalized users will not be marginalized when the UCS is widely

>I'm not about to
>tell the few Abenaki/Penobscot/Passamaquoddy/Malicite/Micmac l10n guys
>to bag it and go standard ... even though the best work is on windoze.
>Had I gone to last week's Indigenous Educators meeting in Hawaii I'd
>have more implementation data to share.

If they want to interchange data in an international context they should
make their requirements known to us. Perhaps they are reinventing the
wheel. Perhaps we have overlooked characters. But we want (I want) the
Abenaki, Penobscot, Passamaquoddy, Malicite, Micmac localization experts to
have their needs met. They have access to the system via you to me to ISO,
if you want to put it that way. Please put me in touch with them.

>There is more to this racket than playing at standards, we're discussing
>making language preservation potentially more difficult, not less so.

As someone working in the lesser-used language industry (such as it is) I
come to quite the opposite opinion.

>Given the history of lexography in the Americas in the 19th century, I
>really doubt that our situation is substantially different than that of
>the Asian languages. We (the modern language users and teachers) elide
>the problem by adopting diacritically simplified systems ... I can write
>Siksika in ASCII, and Abenaki also (using "8" for the omicron upsilon
>ligature), with tolerable composition of vowel and diacritics. However,
>there is an archival requirement for the enormous, redundent, contradictory
>mess of 19th century characters.

This is certainly true, and the experts in WG2 and UTC invite you to assist
us by informing us of the requirements.

Best regards,

Michael Everson * Everson Gunn Teoranta *
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT