Re: Decomposition vs Full decomposition?

From: Deborah Goldsmith (goldsmit@apple.com)
Date: Tue Mar 15 2005 - 12:30:03 CST

  • Next message: Peter Constable: "RE: Decomposition vs Full decomposition?"

    On Mar 15, 2005, at 7:33 AM, Richard T. Gillam wrote:
    >> 1) Is "full decomposition" the same as "normalisation"?
    >
    > No, and it's not the same as "decomposition" either. (I've heard the
    > term "decomposition mapping" used for "decomposition", and I like it
    > better.) The UnicodeData.txt file gives a decomposition mapping for
    > each character that can decompose. For canonical decompositions, this
    > is always a one- or two-character mapping. For both canonical and
    > compatibility decompositions, the mapping may be to one or more
    > characters that can themselves decompose. To get a character's "full
    > decomposition," you keep replacing characters with their decomposition
    > mappings until you get to a sequence of characters that don't
    > decompose. (For compatibility decompositions, both canonical and
    > compatibility mappings are used; for canonical decompositions, only
    > canonical decomposition mappings are used.)
    >
    > Normalization involves not just decomposition, but also canonical
    > reordering and (for some normalizations), recomposition.

    Another important point to make is that "fully decomposed" does not
    mean "no precomposed characters" and "fully composed" does not mean "no
    combining marks." A frequent, erroneous assumption I see among
    newcomers to Unicode (not Rich, certainly!) is that NFC Unicode will
    never contain combining marks, and that NFD will never contain
    precomposed characters (that is, base character + diacritic(s) in one
    character). Neither is true.

    Deborah Goldsmith
    Internationalization, Unicode Liaison
    Apple Computer, Inc.
    goldsmit@apple.com



    This archive was generated by hypermail 2.1.5 : Tue Mar 15 2005 - 12:31:54 CST