Re: Does Unicode 4.1 change NFC?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Apr 03 2005 - 15:05:00 CST

  • Next message: Philippe Verdy: "Re: Version 4.1 of the Unicode Standard Released"

    From: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
    >> Do the recent additions to Unicode 4.1 make any changes to NFC? i.e.
    >> does a program that correctly performs normalization on Unicode 4.0
    >> data need any updates, to data tables or algorithms, to normalize
    >> Unicode 4.1 data in normalization form C?
    >
    > Yes. New CJK compatibility ideographs U+FA70..U+FAD9 have canonical
    > decompositions into single characters. For example NFC(U+FACF) =
    > U+2284A (for the first time a BMP character is normalized to something
    > outside BMP).

    Isn't that against Unicode statibility? Shouldn't it have been the reverse,
    keeping U+FACF stable and normalizing U+2284A to U+FACF to keep the
    compatibility? If this was added because of a past error, then this MUST be
    urgently documented.

    I had really thought the NFC and NFD normalization were intended to be FULLY
    stable (in absence of an obvious error corrected in a corrigendum, but not
    in a release) within the set of codepoints that have receivend standard
    assignments.

    If things change within the set of newly assigned codepoints, this is not an
    issue, as existing documents normalized in the past should not have used
    them (and if they did, they were already non-conforming...)

    > These are the only differences in NFC/NFD between Unicode 4.0.1 and 4.1.0.
    >
    > There are 48 more differences in NFKC/NFKD.

    These are less serious. If a new 4.1 character has now decompositions to
    characters in Unicode 4.0, they respect the principles.

    I will seriously download the new UCD database, when I've got some time. If
    what you say is true, then there's a real problem in the way Unicode now
    considers its "stability pact", if Unicode can change its opinion for such
    characters, but also refuses to change anything in the normalization of
    other scripts like Hebrew which are deserved by its sub-optimal combining
    classes...

    So please, at the UTC, demonstrate that those changes were absolutely
    needed, because the previous normalizations were obviously wrong.



    This archive was generated by hypermail 2.1.5 : Sun Apr 03 2005 - 15:05:55 CST