Re: Mixed up priorities

From: John Hudson (
Date: Thu Oct 21 1999 - 20:14:17 EDT

At 04:49 PM 21-10-99 -0700, G. Adam Stanislav wrote:

>Respectfully, I disagree. I cannot speak for Welsh and Spanish, but in
>Slovak and Czech, CH has all characteristics of a character: It denotes a
>specific sound which cannot be expressed in any other way. Nor can it be
>separated into two sounds.

Are there separate CH, Ch and ch keys on Slovak keyboards?

>Many other alphabets have a separate character for this sound, e.g. the chi
>in Greek, or the Cyrillic character that looks like the Roman X.

Many languages, including English, make use of digraphs and trigraphs to
represent sounds which are represented in other orthographies by single
characters. In some languages these digraphs are considered to be
individual letters, with specific sorting and hyphenation rules associated
with them, but is it true that these sorting and hyphenation rules
_require_ encoding of these digraphs as precomposed characters?

In some non-Slavic language adaptations of the Cyrillic script, up to four
letters may be combined to represent a single sound, and these
'quadragraphs' are often listed as single letters of the alphabet and have
specific sorting and hyphenation rules. Are you suggesting that each of
these sequences _needs_ to be encoded as a precomposed character?

>The fact that it can be constructed from two glyphs, C and H, is
>irrelevant, many other characters can be so constructed (e.g. N with caron
>can constructed from an N and a caron, yet it is a separate character).

There are plenty of people on this list who would argue that it should not be.

>It is not simply a string of characters because it cannot be separated. You
>cannot, for example, divide a word at the end of a line by following the C
>with a - and starting the next line with an H. It is *not* C-H, C-h, and
>c-h. It is CH, Ch, and ch.

Again, is it _necessary_ for this behaviour to be controlled by encoding
these letters as individual, precomposed characters? If there are no CH, Ch
and ch keys on Slovak keyboards -- as I suspect -- you would still require
secondary text processing which would recognise the keying of c followed by
h as ch. What have you actually gained?

Remember that Unicode is a standard for encoding _plain text_. Unicode does
not contain sorting rules for individual languages, nor does it contain
hyphenation rules for individual languages. Unicode provides a standard for
encoding text which can then be properly handled by secondary text
processing software, including dictionaries, language specific hyphenation
algorithms, etc.. The kind of thing you are demanding belongs at this
secondary level, not at the plain text level.

John Hudson

Tiro Typeworks
Vancouver, BC

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT