RE: UTS #10 : comment on Hangul Jamo(Letter) collation

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Tue Aug 26 2003 - 05:01:22 EDT

Next message: Peter Kirk: "Re: Character codes for Egyptian transliteration"

Previous message: Jill.Ramonsky@Aculab.com: "RE: Proposed Draft UTR #31 - Syntax Characters"
In reply to: Jungshik Shin: "UTS #10 : comment on Hangul Jamo(Letter) collation"
Next in thread: Jungshik Shin: "RE: UTS #10 : comment on Hangul Jamo(Letter) collation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Jungshik Shin wrote:

> Sorting Hangul letters (Jamos) according to the current version
> of allkeys.txt is rather like sorting Latin letters according to
> the Unicode 4.0 code points. Because this is well known, UTS #10
> goes to a length to explain how to properly Hangul letters(Jamos).
> However, as it stands, there are a few issues to be clarified.
>
> In mid May this year after a proposed update of UTS #10 had
> been posted,
> there was a thread of discussion about treatment of Hangul
> letters (Jamos)
> in UCA. In the thread, I raised the following issue
> (interleaving issue
> and different treatment of cluster jamos depending on whether they're
> given separate code points of their own in U+1100 block or
> they have to
> be represented as sequences of Jamos encoded).

You may wish to look at
http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051-hangulsort.pdf
which contains a much updated version of my paper on the subject.
The table entries are also found in plain text form at
http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051t-table-hangulctt6.txt
(the "28" at the end is spurious...)

> After a thread of emails exchanged, Mark Davis and I found
> that both of us
> are more or less in the same page as to how Hangul letters be
> collated.
> In summary,
>
> 1. Weights for T, V, and L should be assigned in such a way that
> T < V < L for all T, V, and L's

That would be L < T < V; but that is complicated by the actual need for
(the superficially contradictory) V < L < T < V, with the latter T and V
after all scripts. The Vs at two radically different positions in the table
is for different positions of the V in a syllable; V < L is for first V in
a syllable, T < V is for non-first Vs in a syllable.

> 2. Expand precomposed (cluster) Jamos into sequences of component
> basic Jamos

Needed for covering all combinations of Jamos. If limited to (a superset)
of modern Jamo, this expansion can be avoided. For details, see my paper
referenced above, which lists the weightings and contractions needed for
avoiding this expansion in many (but not all) cases.

> 3. Terminate every syllable with 'TERM' that has a lower weight than
> all T's (there's an alternative to this, but both favors this
> more than the alternative)

This can be avoided if the weighting is done in a particular way.
See my paper for details.

/kent k

Next message: Peter Kirk: "Re: Character codes for Egyptian transliteration"
Previous message: Jill.Ramonsky@Aculab.com: "RE: Proposed Draft UTR #31 - Syntax Characters"
In reply to: Jungshik Shin: "UTS #10 : comment on Hangul Jamo(Letter) collation"
Next in thread: Jungshik Shin: "RE: UTS #10 : comment on Hangul Jamo(Letter) collation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Aug 26 2003 - 06:23:30 EDT