Collated lists of code points

From: J M Craig (jmcraig@xmission.com)
Date: Fri Aug 23 2002 - 14:03:02 EDT


Hi Folks,

(Is there a searchable archive of this list somewhere?)

The question I'm dealing with at the moment is this: Does anyone know
where I can get ( free :-) or not ) lists of Unicode code points
collated according to some particular approach. For instance, I know of
a number of ways to sort Han characters; it would be a big help to have
just a list of the code points in some given collation sequence
(especially if it included all the lastest 3.x characters--but I'd take
anything as it would be a helpful starting point).

I'd also be interested in the same kind of thing for Japanese and
Korean. If I can't find anything, I'll be starting on Korean myself and
if anyone's interested, I can provide what I come up with for that
subset of code points. (If I do it, there'll probably be two lists for
the different collation conventions I'm aware of--see below.)

One other item. I have Korean dictionaries from two different publishers
and they follow different collation conventions for the SANG (double)
consonants. Now, these dictionaries are pretty old so things may be
different now, but in the Donga dictionaries, double consonants are
collated after the corresponding single consonant but before all vowels.
So, what you get is the words that begin with double consonants
interspersed among the words that begin with the singles. In the Minjeon
Seogwan dictionaries, the doubles are after the singles and after all
vowels; that is, there's a section of words that begin with the double
consonant after all those that begin with the corresponding single
consonant. Does anyone know if there is any kind of concensus emerging
as to which pattern is preferred in general? Clearly the Unicode
standard set up the ordering with the Minjeon Seogwan parttern for the
syllables (U+AE4C /kka/ follows U+AE4B /kih/; in the Donga pattern, the
syllable U+AE4C /kka/ would follow U+AC00 /ka/). Now, the last I knew,
Minjeon Seogwan was a more prestigious publisher than Donga, so the
Korean academic community may have influenced the Unicode standard's
choice. Any insight someone can shed on this is much appreciated.

Thanks much,

John Craig
Alpha-G Consulting, LLC



This archive was generated by hypermail 2.1.2 : Fri Aug 23 2002 - 12:33:01 EDT