Re: Last Call: UTF-16

From: Eric Brunner (brunner@maine.rr.com)
Date: Wed Aug 18 1999 - 01:28:16 EDT

Next message: Markus Kuhn: "Re: UTF-8 versus UTF-16 bandwidth"
Previous message: Asmus Freytag: "Re: Bidirectional algorithm question (2.0)"
Maybe in reply to: Michael Everson: "Re: Last Call: UTF-16"
Next in thread: Martin J. Duerst: "Re: Last Call: UTF-16"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>This still does not mean the Internet should endorse or promote the use
>of private character sets, or even allow it.

Like Ken, I've found something in this thread that bothers me. The IR and
the rhetoric surrounding it. Not UTF16, not by a very long shot.

>Let us not be so quick to dismiss the International Register. It confers
>numerous benefits:
>
> 1. ...
>
> 2. Character sets registered in the IR come either from the ISO or
> from the standards body of a member nation,

Formal access to the ISO system is limited to polities recognized as
states and a limited range of other territorial catagories. I'm not
aware of a single instance of an indigenous nation in the Americas
obtaining this status within the UN system, and I'm not offering a
casual observation here.

With the recent exception of the Unified Canadian Aboriginal Syllabary,
the standards bodies of the member states in the Americas have shown
no interest in the technical requirements for the indigenous languages
of the Americas. A quarter million Navajos and no iso3:1 code!

There are lots of other "stateless" linguistic groups, the issue isn't
an Indian monopoly by any means, and it is only partially mitigated by
the well intentioned, often pro bono acts of a few individuals. States
are not too keen on minority languages ... which is why there are UN
resolutions on the subject ... to improve a situation which requires
systemic improvement.

> and therefore do not
> (in general) reflect the interests of a particular company or a
> the quirks of a particular architecture or operating system.

Here also "objective neutrality" chafes unexpectedly. Work outside of
the small paid (and highly networked) i18n community frequently is at
the resource level where working code but being OS dependent is OK,
it beats the alternative, no code other than ASCII. I'm not about to
tell the few Abenaki/Penobscot/Passamaquoddy/Malicite/Micmac l10n guys
to bag it and go standard ... even though the best work is on windoze.
Had I gone to last week's Indigenous Educators meeting in Hawaii I'd
have more implementation data to share.

There is more to this racket than playing at standards, we're discussing
making language preservation potentially more difficult, not less so.

> 3. A unique registration number is assigned which allows the character
> set to be identified in a concise, unambiguous, and language-neutral
> manner.

This is a reasonably desirable property, but a single registry is not a
predicate condition to this end.

> 4. ...
> 5. ...
> 6. ...
> 7. An official name is given to each character so we may identify it
> sufficiently to correlate it with instances in other character sets
> for mapping purposes.

Given the history of lexography in the Americas in the 19th century, I
really doubt that our situation is substantially different than that of
the Asian languages. We (the modern language users and teachers) elide
the problem by adopting diacritically simplified systems ... I can write
Siksika in ASCII, and Abenaki also (using "8" for the omicron upsilon
ligature), with tolerable composition of vowel and diacritics. However,
there is an archival requirement for the enormous, redundent, contradictory
mess of 19th century characters.

> 8. ...

I for one now intend to see what Paul and Francois make of the mass of
mail. An AD can wake me up if necessary.

Adio (shorter in Abenaki, neh?)
Eric

Next message: Markus Kuhn: "Re: UTF-8 versus UTF-16 bandwidth"
Previous message: Asmus Freytag: "Re: Bidirectional algorithm question (2.0)"
Maybe in reply to: Michael Everson: "Re: Last Call: UTF-16"
Next in thread: Martin J. Duerst: "Re: Last Call: UTF-16"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT