RE: UTS #10 : comment on Hangul Jamo(Letter) collation

From: Jungshik Shin (
Date: Sat Aug 30 2003 - 06:17:35 EDT

  • Next message: Kent Karlsson: "RE: (SC22WG20.4660) RE: UTS #10 : comment on Hangul Jamo(Letter) collation"

    On Tue, 26 Aug 2003, Kent Karlsson wrote:


    Thank you for your work on Korean sorting and sorry for my late reply.
    I'll be very brief because I have something urgent to take care of.

    > Jungshik Shin wrote:
    > You may wish to look at
    > which contains a much updated version of my paper on the subject.
    > The table entries are also found in plain text form at

      Wow, you've created all these entries. Thanks.

    > > After a thread of emails exchanged, Mark Davis and I found
    > > that both of us
    > > are more or less in the same page as to how Hangul letters be
    > > collated.
    > > In summary,
    > >
    > > 1. Weights for T, V, and L should be assigned in such a way that
    > > T < V < L for all T, V, and L's
    > That would be L < T < V; but that is complicated by the actual need for
    > (the superficially contradictory) V < L < T < V, with the latter T and V
    > after all scripts.

      I'm not following you here. 'T < V < L' works well in Mark's
    and my scheme for the most generic form of Korean syllables, 'L+V+T*'
    as far as South Korean collation rules are concerned.

    > The Vs at two radically different positions in the table
    > is for different positions of the V in a syllable; V < L is for first V in
    > a syllable, T < V is for non-first Vs in a syllable.

      Aha, you're talking about your scheme.

    > > 2. Expand precomposed (cluster) Jamos into sequences of component
    > > basic Jamos
    > Needed for covering all combinations of Jamos. If limited to (a superset)
    > of modern Jamo, this expansion can be avoided.


    > referenced above, which lists the weightings and contractions needed for
    > avoiding this expansion in many (but not all) cases.
    > > 3. Terminate every syllable with 'TERM' that has a lower weight than
    > > all T's (there's an alternative to this, but both favors this
    > > more than the alternative)
    > This can be avoided if the weighting is done in a particular way.
    > See my paper for details.

      Indeed. However, I'm wondering if avoiding TERM is a better
    trade-off than avoiding seemingly more complex(than Mark's and mine)
    scheme of yours that also requires pre-handling. Could you give me some
    rationale behind your preferring yours to the other? Is it because it's
    better suited to tailoring for North Korean? I haven't given much thought
    to North Korean collation rules recently (at the moment, I have to look
    them up again to refresh my memory.)


    This archive was generated by hypermail 2.1.5 : Sat Aug 30 2003 - 06:52:55 EDT