RE: UTS #10 : comment on Hangul Jamo(Letter) collation

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Tue Aug 26 2003 - 05:01:22 EDT

  • Next message: Peter Kirk: "Re: Character codes for Egyptian transliteration"

    Jungshik Shin wrote:

    > Sorting Hangul letters (Jamos) according to the current version
    > of allkeys.txt is rather like sorting Latin letters according to
    > the Unicode 4.0 code points. Because this is well known, UTS #10
    > goes to a length to explain how to properly Hangul letters(Jamos).
    > However, as it stands, there are a few issues to be clarified.
    >
    > In mid May this year after a proposed update of UTS #10 had
    > been posted,
    > there was a thread of discussion about treatment of Hangul
    > letters (Jamos)
    > in UCA. In the thread, I raised the following issue
    > (interleaving issue
    > and different treatment of cluster jamos depending on whether they're
    > given separate code points of their own in U+1100 block or
    > they have to
    > be represented as sequences of Jamos encoded).

    You may wish to look at
    http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051-hangulsort.pdf
    which contains a much updated version of my paper on the subject.
    The table entries are also found in plain text form at
    http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051t-table-hangulctt6.txt
    (the "28" at the end is spurious...)

    > After a thread of emails exchanged, Mark Davis and I found
    > that both of us
    > are more or less in the same page as to how Hangul letters be
    > collated.
    > In summary,
    >
    > 1. Weights for T, V, and L should be assigned in such a way that
    > T < V < L for all T, V, and L's

    That would be L < T < V; but that is complicated by the actual need for
    (the superficially contradictory) V < L < T < V, with the latter T and V
    after all scripts. The Vs at two radically different positions in the table
    is for different positions of the V in a syllable; V < L is for first V in
    a syllable, T < V is for non-first Vs in a syllable.

    > 2. Expand precomposed (cluster) Jamos into sequences of component
    > basic Jamos

    Needed for covering all combinations of Jamos. If limited to (a superset)
    of modern Jamo, this expansion can be avoided. For details, see my paper
    referenced above, which lists the weightings and contractions needed for
    avoiding this expansion in many (but not all) cases.

    > 3. Terminate every syllable with 'TERM' that has a lower weight than
    > all T's (there's an alternative to this, but both favors this
    > more than the alternative)

    This can be avoided if the weighting is done in a particular way.
    See my paper for details.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Tue Aug 26 2003 - 06:23:30 EDT