RE: Confusion about composition

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Wed Jan 14 2004 - 09:09:02 EST

  • Next message: Tom Gewecke: "Re: New MS Mac Office and Unicode?"

    [trying to catch up on *some* of the e-mails here...]

    François Yergeau wrote:

    > This little-known fact (along with the better-known fact that not all
    > non-zero-ccc-characters do take part in existing precomposed
    > characters) has
    > prompted the W3C's Character Model spec to define "composing
    > characters", a
    > concept somewhat distinct from Unicode's combining
    > characters. Appendix C
    > at
    ....
    > contains the definition as well as a list of the characters with
    > ccc=0 that do take part in existing compositions; U+102E is there, of
    > course, as well as the above-mentionned Hangul plus some others.

    Hmm, Hangul. Now, the composition rules for Hangul ARE special.
    That's why it's not just the case that V and T Jamos are combining,
    and all the rest of Hangul characters just regular non-combining.
    ALL of the L, V, T, LV, and LVT Hangul characters are CONJOINING.
    E.g. an L followed by an LVT is a SINGLE Hangul syllable. The notion
    of "composing characters" in that appendix C misses that point,
    and goes back to an old proposed (but never in Unicode) model
    where there where just the Ls, Vs, and Ts, with the latter two
    combining, and L T V V T would be a single Hangul syllable.
    Unfortunately, that is plain wrong in the adopted model for Hangul.
    However, L L V V T *is* a single Hangul syllable, so is L LV T T, and
    LVT T, and ... Indeed, an L LVT (e.g.) may normalise to (another) LVT.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Wed Jan 14 2004 - 09:58:27 EST