Re: "Data-visualization" for Unicode Collation: Khmer?

From: Maurice Bauhahn (bauhahnm@clara.net)
Date: Wed Mar 22 2000 - 08:45:06 EST


Hello Mark,

Thank you for the collation tables...a tidy bit of programming! I searched
through every page but could not find 178X series of Khmer characters. Has
Khmer not made it into the collation sequence? Although combinations of Khmer
characters have a very complex ordering, the consonants dominate over the
vowels. The consonants 1780-17A2 are encoded in alphabetic order. The
independent vowels 17A3-17B3 (not exactly in alphabetic order; have a similar
priority (slightly lower) and are never in cluster [well...almost never:
there may be a situation where there is an explicit vowel with a subscript
independent vowel but it must be exceedingly rare] with the dependent vowels
at 17B4-17C5 (which are in alphabetic). The signs are next in priority but
have only rather weak collation sequencing 17C6-17D1 (the first six have
fixed sequencing). The sign indicating that the next character is a
subscript (17D2) does not itself have a collation sequence as I understand
that, but it does infer that the character following it (usually a consonant
but rarely an independent vowel) will bring the cluster to a lower collation
priority. All other characters break the collation sequence.

Hence an alphabetic sort would have:

Consonant/Independent Vowel - Primary sort*
17D2 and Consonant/Independent Vowel - Secondary sort (first subscript)
17D2 and Consonant/Independent Vowel - Tertiary sort (second subscript)
Vowel - Quadriary sort (starting with inherent vowels 17B4 and 17B5 which are
normally not encoded [17B4 assumed if no explicit vowel])
Sign - Quintary sort
(please pardon the spelling of the fourth and fifth level sort...not sure
what is right)
*There is some complication in this...Independent Vowels equate in collation
to different Consonant/Vowel (Primary and Quadriary) combinations:

17A3 = 17A2
17A4=17A2+17CB (a sign!)
17A5=17A2+17B7
17A6=17A2+17B9
17A7=17A2+17BB
17A8=17A2+17BB+
17A9=17A2+17BC
17AA=17A2+17BC+
17AB=179A+17B9
17AC=179A+17BA
17AD=179B+17B9
17AE=179B+17B9
17AF=17A2+17C2
17B0=17A2+17C3
17B1=17A2+17C4
17B2=17A2+17C4
17B3=17A2+17C5

Note that 17A7 and 17A8 have nearly the same value...however the latter has a
final consonental sound not treated equivalent to a second consonant but
weighted slightly different from the first
Note that 17A9 and 17AA have nearly the same value...however the latter has a
final consonental sound not treated equivalent to a second consonant but
weighted slightly different from the first

What in addition do I need to do to facilitate Khmer being considered in
Collation?

Pensively;-)

Maurice

mark.davis@us.ibm.com wrote:

> While on the plane last week, I wrote a program that generates charts of
> the default Unicode collation ordering. These charts displays the actual
> characters in order, rather than merely listing them by name. If you are
> interested, you can find the charts at
> http://www.unicode.org/unicode/reports/tr10/charts/. Feedback is welcome.
>
> Mark
> ___
> Mark Davis, IBM Center for Java Technology, Cupertino
> (408) 777-5850 [fax: 5891], mark.davis@us.ibm.com, president@unicode.org
> http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014

--
Maurice Bauhahn
2 Meadow Way
Dorney Reach
MAIDENHEAD
SL6 0DS
United Kingdom
Home Tel: +44(0)1628 626068
Work Tel: +44(0)118 9016020
Home Email: bauhahnm@clara.net
Work Email: mbauhahn@brio.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT