Re: NFD on u+AC00 contradicts NormalisationData.txt ?

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jun 14 2006 - 17:17:11 CDT

  • Next message: Mike: "Re: What is a Jamo, and why is it staring at me?"

    Theodore H. Smith continued:

    > > Theodore H. Smith wrote:
    > >> Does AC00 actually decompose?
    > > Yes. See TUS section 3.12 "Conjoining Jamo Behavior", <http://
    > > www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G24646>.
    >
    > But why isn't it listed in UnicodeData.txt?

    TUS 4.0, p. 72:

    D23 Canonical decomposition: The decomposition of a character
    that results from recursively applying the canonical mappings
    found in the names list of Section 16.1, Character Names List,
    and those described in Section 3.12, Conjoining Jamo Behavior,
    ^^^
    until no characters can be further decomposed, and then
    reordering nonspacing marks according to Section 3.11,
    Canonical Ordering Behavior.

    TUS 4.0, p. 418:

    A character names list is not provided for characters in
    the Hangul Syllables block, U+AC00..U+D7AF, because the
    name of a Hangul syllable can be determined by algorithm
    as described in Section 3.12, Conjoining Jamo Behavior.

    UnicodeData.txt:

    AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
    D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;

    Those entries indicate the beginning and end range of
    the Hangul syllables, rather than listing 11,172 Hangul
    syllables, all of which have names and decompositions derivable
    by algorithm. (see above)

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 17:25:08 CDT