Re: Amerindian Characters

Date: Wed Jun 16 1999 - 14:09:10 EDT


I personally appreciate the concern shown for minority languages of N. America.
I'd be surprised if a thorough study of all Amerindian languages didn't produce
a list that was quite a bit longer.

In fact, all of the characters you mentioned are already covered by the
standard. They are not covered, however, using separate allocations for the
various base + diacritic combinations. Instead, characters like Latin letter I
with cedilla are encoded as a sequence of characters, LATIN LETTER I
(small/large) followed by U+0327 COMBINING CEDILLA . The standard allows
characters to be combined productively, so what is already there also covers
combinations involving multiple diacritics.

On occasion, a contributor on this list might raise the question as to whether a
certain base + diacritic combination shouldn't have its own allocation. A
typical reason might be that the combination is understood as a distinct and
complete letter in the orthography of some important language, and that the
absense of the pre-composed combination creates additional work for those that
work with the given language. In fact, in some respects, adding new pre-composed
combinations creates additional work. The reason is that the single,
pre-composed character (e.g. LATIN LETTER SMALL I WITH CEDILLA) must treated as
being in every way identical to the corresponding decomposed sequence (LATIN
LETTER SMALL I followed by COMBINING CEDILLA). This is a requirement of software
in order to be Unicode conformant.

There really is no reason why pre-composed combination characters are needed,
and pretty good reasons need to be provided before the Unicode and ISO
committees will seriously consider adding new pre-composed characters. People
will sometimes appeal to the fact that other pre-composed combination characters
have already been added to the standard. In most such cases, however, very
strong reasons were given: that the pre-composed character already existed in an
existing international encoding standard (e.g. ISO 8859-1). In order to provide
round-trip convertibility, what was before must live on. Had it not been for
pre-existing standards, all of these characters may not need to have been

Now, if there are characters which simply can't be constructed out of any of the
pieces existing in the standard, *that* is very worthwhile pointing out. I am
aware of some such instances for languages spoken in Papua New Guinea, and plan
to request some additions to the standard for this purpose once I have a chance
to get enough info assembled.

I hope this helps.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT