Re: Hexadecimal in many scripts

From: Peter_Constable@sil.org
Date: Sat Jun 05 1999 - 09:40:00 EDT


In pointing out the issue of what the first six letters of the Cyrillic alphabet
are, Doug Ewell has brought to light what I think is a fundamental problem with
the non-Latin aspect of this proposal: the meaning of "first six letters of the
alphabet" is not determined per script, but is at least language dependent, and
often dependent upon more than that. For example, some Spanish speakers (I
think, in spite of recent discussions on this topic) would consider the first
six letters of their alphabet to be "a, b, c, ch, d, e".

We could respond to this example by saying that, for Latin, we will simply
assume people will use A-F (which seems reasonably obvious). The point is,
though, that just as what "the first six letters of the alphabet" means for
Latin changes from language to language, the same is true for other scripts.
When we say, "the first six letter of the Cyrillic alphabet", do we mean "the
Russian alphabet"? Would speakers of other languages that use Cyrillic get
confused if the collation sequence of their writing system is different? What if
the keyboard layout for their language simply did not include one of the 6
designated letters? Has anyone determined that there are any 6 letters that are
common to every language that is written with Cyrillic script? These same
concerns also apply to Ethiopic, and any other script that is used for multiple
languages.

It's also unclear to me how this idea is supposed to be applied in far east
contexts. What if your writing system is Simplified Chinese?

I certainly favour the idea that there should be a consistent way that any user
can enter any Unicode character, but I'm concerned that not enough thought has
gone into the idea of (in effect) localising the hexadecimal digits. It may be
that all of these concerns can be surmounted; e.g. if there are 6 letters common
to all writing systems based upon Ethiopic and users simply learn to use these
regardless of the collation sequences of their particular writing systems. (The
fact that some languages of that region have yet to have their orthographies
established adds an element of unpredictability here.) I'm just concerned that
these issues be considered. It's not clear to me that they have.

Peter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT