Hangul jamos [was: Re: logos, symbols, and ligatures (RE: Encoding Personal Use Ideographs)]

From: Eric Muller (emuller@adobe.com)
Date: Mon Nov 05 2007 - 10:34:40 CST

  • Next message: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs"

    Kent Karlsson wrote:
    > And yet the UTC, as well as WG2, seems to be in the process of adopting 100+
    > Hangul Jamo that are aren't even ligature-like, but each just represents a sequence
    > of Hangul conjoining letters.

    IMHO, this is the predictable and inevitable result of the canonical
    decompositions which have been frozen in Unicode 3.1.

    Since that day, the standard says that they are three different coded
    character sequences to represent 갂:

    S = <AC02>
    LVT=<1100 1161 11A9>
    LVTT=<1100 1161 11A8 11A8>

    with S and LVT canonically equivalent but not equivalent to LVTT. The
    bulk of the data that exists today uses S/LVT; where "bulk" is probably
    99%. The idea of LVTT, however sensible and desirable, did not happen in
    practice. Because 11A9 is not and cannot be made canonically equivalent
    11A8, 11A8, I believe that the only sensible course of action is to
    admit that the idea of L+, V+, T+ in a syllable did not succeed, and
    continue down the path of "complex" jamos (such as 11A9). I would even
    recommend to deprecate the use of multiple "simplex" jamos in each part
    of a syllable, as a way to resolve the problem of multiple
    non-equivalent representations, and the implementation problems that
    causes (In fact, I am ready to bet that most implementations simply
    treat LVTT as different from S/LVT, one more reason for cleaning the

    I think the alternative you prefer (keep using <11A9>, but do not create
    new combinations like that) would not result in a system that is clean
    from a model point of view, nor in a system that is clean from an
    implementation point of view. So I don't see anything that makes it


    This archive was generated by hypermail 2.1.5 : Mon Nov 05 2007 - 10:36:55 CST