Re: Mixed up priorities

From: G. Adam Stanislav (adam@whizkidtech.net)
Date: Thu Oct 21 1999 - 23:12:38 EDT


At 17:19 21-10-1999 -0700, Ashley Yakeley wrote:
>>Respectfully, I disagree. I cannot speak for Welsh and Spanish, but in
>>Slovak and Czech, CH has all characteristics of a character: It denotes a
>>specific sound
>
>...just as it does in English, though doubtless a different sound....

So? Unicode is not about English alone. English speakers do not think of CH
as a separate character. Slovak speakers do. That we happen to write it as
a sequence of two glyphs does not change the fact that to us it is a
character. Just ask any Slovak that cannot write if CH is a C followed by
an H (well, you probably cannot find a Slovak who cannot write, but,
hopefully, you get the point).

>The English 'ch' can be separated into 't-sh', though 'sh' and 'th'
>cannot be.

Again, so? Slovak 'ch' cannot be separated. It is a character it Slovak
linguistics. I cannot speak for English linguistics. If English linguists
thought "th" and "sh" were characters, I would not argue with them because
in Slovak they are separate.

>>Many other alphabets have a separate character for this sound, e.g. the chi
>>in Greek, or the Cyrillic character that looks like the Roman X.
>
>And many do not, e.g. the 'ch' in the Scottish 'loch' or German 'Bach'.

So? Some languages do not have the 'q'. That does not mean it should not be
encoded, does it.

It is completely irrelevant how other languages treat the "ch". It is a
character in at least two languages. If that's not good enough, why not
remove the thorn from Unicode. Or the slashed L (only used by one
language). Or the scharfes S. Or the Hungarian umlaut. (No, I'm not
suggesting any of that, and yes, I know they all are written as a single
glyph, but Unicode encodes characters, not glyphs.)

>>It is not simply a string of characters because it cannot be separated. You
>>cannot, for example, divide a word at the end of a line by following the C
>>with a - and starting the next line with an H.
>
>Neither can you in English.

Irrelevant. The English do not think of it as a character. We do.

>>Also, ask any Slovak to tell you what the alphabet is, he will inevitably
>>list a H CH I within the sequence.
>
>That's a sorting thing, isn't it?

No, it isn't. If we said the alphabet was "A,E,I,O,U,B,C,D,etc," that would
be sorting. If we think of the CH as a separate character, then it is a
character, at least to us, no matter what glyphs are used to produce that
character. I would not expect to see "CH" on a typewriter, for example,
because the "C" and "H" glyphs can be used to type it. Nor would I expect
to see it in fonts, because fonts are about glyphs. But Unicode is a
character standard, not a glyph standard. And it is an international
standard. That other languages use the glyph sequence of C followed by H
has nothing to do with what Slovak does. Nor does the fact that I'm the
only Slovak here.

By the way, we do have other combinations that we never separate, such as
"ia", "ie", "iu". They are diphthongs. But we do not think of them as
separate characters. I would not dream about proposing they get separate
encoding (of, course, I would not oppose to it if they were considered
characters in some other language).

The funny thing is that had whoever ported the Roman alphabet to the Slovak
language decided that particular sound should be written as slashed H, or
whatever, no one would hesitate to encode it in Unicode.

Adam



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT