From: Deborah Goldsmith (goldsmit@apple.com)
Date: Tue Mar 15 2005 - 12:30:03 CST
On Mar 15, 2005, at 7:33 AM, Richard T. Gillam wrote:
>> 1) Is "full decomposition" the same as "normalisation"?
>
> No, and it's not the same as "decomposition" either. (I've heard the
> term "decomposition mapping" used for "decomposition", and I like it
> better.) The UnicodeData.txt file gives a decomposition mapping for
> each character that can decompose. For canonical decompositions, this
> is always a one- or two-character mapping. For both canonical and
> compatibility decompositions, the mapping may be to one or more
> characters that can themselves decompose. To get a character's "full
> decomposition," you keep replacing characters with their decomposition
> mappings until you get to a sequence of characters that don't
> decompose. (For compatibility decompositions, both canonical and
> compatibility mappings are used; for canonical decompositions, only
> canonical decomposition mappings are used.)
>
> Normalization involves not just decomposition, but also canonical
> reordering and (for some normalizations), recomposition.
Another important point to make is that "fully decomposed" does not
mean "no precomposed characters" and "fully composed" does not mean "no
combining marks." A frequent, erroneous assumption I see among
newcomers to Unicode (not Rich, certainly!) is that NFC Unicode will
never contain combining marks, and that NFD will never contain
precomposed characters (that is, base character + diacritic(s) in one
character). Neither is true.
Deborah Goldsmith
Internationalization, Unicode Liaison
Apple Computer, Inc.
goldsmit@apple.com
This archive was generated by hypermail 2.1.5 : Tue Mar 15 2005 - 12:31:54 CST