Re: Default Collation

From: Jim Allan (jallan@smrtytrek.com)
Date: Thu Jan 09 2003 - 13:38:54 EST

Next message: sourav mazumder: "Collation issue with Japanese characters (encoding utf-8) in Oracle8i"

Previous message: Yannis Haralambous: "Call for papers EuroTeX'2003"
Maybe in reply to: Ake Persson: "Default Collation"
Next in thread: sourav mazumder: "Collation issue with Japanese characters (encoding utf-8) in Oracle8i"
Reply: sourav mazumder: "Collation issue with Japanese characters (encoding utf-8) in Oracle8i"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Åke Persson posted:

> The Default Unicode Collation Element Table
> http://www.unicode.org/reports/tr10/allkeys.txt
> specifies that
> U+02BB MODIFIER LETTER TURNED COMMA
> U+02BD MODIFIER LETTER REVERSED COMMA
> collate between H and I. Can anyone explain why?

Unicode often collates derived letters immediately following the origin
letter from which they are derived.

From http://dumbellgreek.gospelcom.net/intro.html:

Instead of a letter for the "H" sound, the Greeks used a rough breathing
sign, ʻ. This rough breathing "H" sound appears over the vowel at the
beginning of a word. When the vowel is a capital, the rough breathing
precedes, ʻΑ. If a vowel begins a word and it does not have an "H" sound
then it has the smooth breathing sound ʼ.

The letter Η, eta, was originally the "H" sound. However, when the Attic
alphabet was adopted, Η became eta. In order to create a symbol for the
"H" sound, the Greeks broke the Η in two using each part for rough and
smooth breathing.

Accordingly Latin Letter H, Modifier Letter Turned Comma and Modifier
Letter Reversed Comma all originate from the same Greek/Phoenician
Letter if you go back far enough. Actually Modifier Letter Turned Comma
and Modifier Letter Reversed Commma originate from half of that letter.

I presume that is the reason for the collation in Unicode, though one
might then expect to also find some other apostrophe and half circle and
glottal stop phonetic symbols at this position in the collation series
instead of following the "normal" Latin letters.

A Unicode database giving reason for the collation for each character
might be useful. Otherwise one must guess or attempt to reconstruct the
original reasoning to see if any suggested change is actually an
improvement.

Jim Allan

Next message: sourav mazumder: "Collation issue with Japanese characters (encoding utf-8) in Oracle8i"
Previous message: Yannis Haralambous: "Call for papers EuroTeX'2003"
Maybe in reply to: Ake Persson: "Default Collation"
Next in thread: sourav mazumder: "Collation issue with Japanese characters (encoding utf-8) in Oracle8i"
Reply: sourav mazumder: "Collation issue with Japanese characters (encoding utf-8) in Oracle8i"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 09 2003 - 14:16:53 EST