Re: Default Collation

From: Jim Allan (
Date: Thu Jan 09 2003 - 13:38:54 EST

  • Next message: sourav mazumder: "Collation issue with Japanese characters (encoding utf-8) in Oracle8i"

    Åke Persson posted:

    > The Default Unicode Collation Element Table
    > specifies that
    > collate between H and I. Can anyone explain why?

    Unicode often collates derived letters immediately following the origin
    letter from which they are derived.


    Instead of a letter for the "H" sound, the Greeks used a rough breathing
    sign, ʻ. This rough breathing "H" sound appears over the vowel at the
    beginning of a word. When the vowel is a capital, the rough breathing
    precedes, ʻΑ. If a vowel begins a word and it does not have an "H" sound
    then it has the smooth breathing sound ʼ.

    The letter Η, eta, was originally the "H" sound. However, when the Attic
    alphabet was adopted, Η became eta. In order to create a symbol for the
    "H" sound, the Greeks broke the Η in two using each part for rough and
    smooth breathing.

    Accordingly Latin Letter H, Modifier Letter Turned Comma and Modifier
    Letter Reversed Comma all originate from the same Greek/Phoenician
    Letter if you go back far enough. Actually Modifier Letter Turned Comma
    and Modifier Letter Reversed Commma originate from half of that letter.

    I presume that is the reason for the collation in Unicode, though one
    might then expect to also find some other apostrophe and half circle and
    glottal stop phonetic symbols at this position in the collation series
    instead of following the "normal" Latin letters.

    A Unicode database giving reason for the collation for each character
    might be useful. Otherwise one must guess or attempt to reconstruct the
    original reasoning to see if any suggested change is actually an

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Thu Jan 09 2003 - 14:16:53 EST