RE: Hexadecimal in many scripts

From: Addison Phillips (AddisonP@simultrans.com)
Date: Sat Jun 05 1999 - 11:38:26 EDT


An additional illustration.

A friend of mine kept having problems with her QA department's testing. Seems that bugs kept cropping up with regard to hotkeys that could not be accessed and other keyboard related trivia for a product not localized into variety of languages. So she ordered one of every keyboard for her platform for the QA department.

58 keyboards arrived and had to be labeled by language(s).

If people need 58 keyboards in the world, then there are at least 58 different combinations of characters. Not to mention collation orders of those symbols (which is locale related, as Peter points out).

Addison

-----Original Message-----
From: Peter_Constable@sil.org [mailto:Peter_Constable@sil.org]
Sent: Saturday, June 05, 1999 8:01 AM
To: Unicode List
Subject: Re: Hexadecimal in many scripts




In pointing out the issue of what the first six letters of the Cyrillic alphabet
are, Doug Ewell has brought to light what I think is a fundamental problem with
the non-Latin aspect of this proposal: the meaning of "first six letters of the
alphabet" is not determined per script, but is at least language dependent, and
often dependent upon more than that. For example, some Spanish speakers (I
think, in spite of recent discussions on this topic) would consider the first
six letters of their alphabet to be "a, b, c, ch, d, e".

We could respond to this example by saying that, for Latin, we will simply
assume people will use A-F (which seems reasonably obvious). The point is,
though, that just as what "the first six letters of the alphabet" means for
Latin changes from language to language, the same is true for other scripts.
When we say, "the first six letter of the Cyrillic alphabet", do we mean "the
Russian alphabet"? Would speakers of other languages that use Cyrillic get
confused if the collation sequence of their writing system is different? What if
the keyboard layout for their language simply did not include one of the 6
designated letters? Has anyone determined that there are any 6 letters that are
common to every language that is written with Cyrillic script? These same
concerns also apply to Ethiopic, and any other script that is used for multiple
languages.

It's also unclear to me how this idea is supposed to be applied in far east
contexts. What if your writing system is Simplified Chinese?

I certainly favour the idea that there should be a consistent way that any user
can enter any Unicode character, but I'm concerned that not enough thought has
gone into the idea of (in effect) localising the hexadecimal digits. It may be
that all of these concerns can be surmounted; e.g. if there are 6 letters common
to all writing systems based upon Ethiopic and users simply learn to use these
regardless of the collation sequences of their particular writing systems. (The
fact that some languages of that region have yet to have their orthographies
established adds an element of unpredictability here.) I'm just concerned that
these issues be considered. It's not clear to me that they have.

Peter




This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT