Re: Mixed up priorities

From: Michael Everson (everson@indigo.ie)
Date: Sat Oct 23 1999 - 07:16:02 EDT


Adam said:

> Yes, we can type "ch" using the GLYPHS "c" and "h", but
> Unicode prides itself in being a character encoding, not a
> glyph encoding. To us, "ch" is a character. Period. In our
> dictionaries the "ch" follows the "h" and precedes the "i". We
> would never dream of looking for "ch" after "cg" and before
> "ci".

Character is a specific term with regard to these coded entities.
Dictionaries are not arranged in chapters according to _characters_;
dictionaries are arranged in chapters according to _letters_. So, Welsh:

A
B
C
CH
D
DD
E
F
FF ...

and Slovak

A
B
C
D
E
F
G
H
CH
I
J ...

These are letters, not characters.

Consider Breton, which not only has CH but C'H as different letters of its
alphabet. Currently, there are lots of ways these letters, with their case
differences, can be represented in Unicode:

0063+0068, 0063+0048, 0043+0048 CH Ch ch
0063+0027+0068, 0063+0027+0048, 0043+0027+0048 C'H C'h c'h
0063+02BC+0068, 0063+02BC+0048, 0043+02BC+0048 C*H C*h c*h
    where * here indicates the modifier letter apostrophe

This is plenty for the Bretons to manage in their ordering. Adding _nine_
more "characters" which can already be represented in the UCS and which can
be sorted correctly by using ISO/IEC 14651 and which can even be input from
a single keystroke if the keyboard driver is so written would be of
advantage to no one.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT