Re: Mixed up priorities

From: peter_constable@sil.org
Date: Fri Oct 22 1999 - 11:08:14 EDT


>>Are there separate CH, Ch and ch keys on Slovak keyboards?

>Of course not. Keyboards were designed in America. Besides,
       keyboards are glyph-oriented, not character oriented. I am not
       aware of any operating system that can display two glyphs for a
       single character (not yet, anyway). Are we here to accept the
       status quo, or to internationalize computing?

>>Many languages, including English, make use of digraphs and
       trigraphs to >represent sounds which are represented in other
       orthographies by single >characters.

>Oh, yeah, it's all about English. The rest of us are idiots.
       English does not consider digraphs separate characters, and
       English is right. The rest of us should just assimilate.
       Resistance is futile.

>Well, fine. Then let's declare Unicode the English way of
       transcribing languages, and not call it an international
       standard of character encoding.

       Hey, Adam, you're not giving the rest of us much credit for
       being concerned about I18N and the needs of non-English users.
       We really are concerned. It's just that what you're asking for
       *won't make any difference* to Slovak users other than in their
       (your) perception.

       If you don't like English examples to prove a point, use
       Spanish. "Ch" is considered a separate character in Spanish,
       and Spanish users can do *all* they want using the presently
       available encoding.

       By the way, keyboards are *not* glyph oriented. Ask any speaker
       of Chinese.

>>In some non-Slavic language adaptations of the Cyrillic
       script, up to four >letters may be combined to represent a
       single sound, and these >'quadragraphs' are often listed as
       single letters of the alphabet and have >specific sorting and
       hyphenation rules. Are you suggesting that each of >these
       sequences _needs_ to be encoded as a precomposed character?

>I am not talking about transliteration. I am talking about
       native use. If some language natively considers a quadragraph a
       character in its own right, then yes, we need to encode it. Or
       we need to stop referring to Unicode as CHARACTER ENCODING.
       Either solution is acceptable.

       Nobody's talking about transliteration here. In Lanna script, I
       know of a sequence of 5 symbols (discontiguous, by the way)
       that make up a single entity. When we get to discussing
       proposals for Lanna, I will *not* be recommending that this be
       encoded as a separate entity because it simply isn't necessary,
       no matter how native users perceive it.

>>>The fact that it can be constructed from two glyphs, C and
       H, is >>irrelevant, many other characters can be so constructed
       (e.g. N with caron >>can constructed from an N and a caron, yet
       it is a separate character).
>
>>There are plenty of people on this list who would argue that
       it should not be.

>But the fact is, it is. And as long as Unicode is to be
       thought of as character encoding, it should be.

       Wrong definition of character. (See Socratic dialogue.)

>>What have you actually gained?

>Consistency. There is a DZ, for example.

       Sorry, but consistency simply is not acceptable justification
       is a standard that has been forced to make compromises for
       legacy standards while still wanting to maintain some ideals
       wherever possible.

>>Remember that Unicode is a standard for encoding _plain
       text_.

>No, it is a standard for encoding _characters_. It states so
       quite explicitly.

       Again, you're working with the wrong definition of character.

>Yes, it is possible to encode the CH as the C followed by the
       H, and the N caron by the N followed by some connection code
       followed by a caron. And it is perfectly possible for software
       to handle it. But that would not be CHARACTER encoding. Unicode
       clearly states its goal to be the encoding of characters of all
       languages, existing and defunct. CH is a character is in
       Slovak.

       Yes, it is character encoding, just not the definition of
       character you're assuming.

       Peter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT