RE: Confusion about composition

From: Francois Yergeau (
Date: Mon Jan 12 2004 - 13:38:18 EST

  • Next message: Deborah Goldsmith: "Re: New MS Mac Office and Unicode?"

    Markus Scherer wrote:
    > Clark Cox wrote:
    > > According to the comment at the beginning of the file, and
    > all that I've
    > > read elsewhere, toNFC(U+1025 U+102E) should result in
    > U+1026. However
    > > both U+1025 and U+102E have combining classes of zero, so
    > my code does
    > > not compose those characters. No information that I've been
    > able to find
    > > has been able to explain this discrepancy. Any help would
    > be greatly
    > > appreciated.
    > There is no discrepancy. The starter must have ccc==0 but the
    > second character's ccc can be anything. See Hangul.

    This little-known fact (along with the better-known fact that not all
    non-zero-ccc-characters do take part in existing precomposed characters) has
    prompted the W3C's Character Model spec to define "composing characters", a
    concept somewhat distinct from Unicode's combining characters. Appendix C
    ngChars contains the definition as well as a list of the characters with
    ccc=0 that do take part in existing compositions; U+102E is there, of
    course, as well as the above-mentionned Hangul plus some others.


    This archive was generated by hypermail 2.1.5 : Mon Jan 12 2004 - 14:14:17 EST