Re: Does Unicode 4.1 change NFC?

From: Peter Kirk (
Date: Sun Apr 03 2005 - 16:36:15 CST

  • Next message: Michael Everson: "Re: Sindhi characters proposed"

    On 03/04/2005 22:28, Doug Ewell wrote:

    >Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    >>>Yes. New CJK compatibility ideographs U+FA70..U+FAD9 have canonical
    >>>decompositions into single characters. For example NFC(U+FACF) =
    >>>U+2284A (for the first time a BMP character is normalized to
    >>>something outside BMP).
    >>Isn't that against Unicode statibility? Shouldn't it have been the
    >>reverse, keeping U+FACF stable and normalizing U+2284A to U+FACF to
    >>keep the compatibility? If this was added because of a past error,
    >>then this MUST be urgently documented.
    >They're new characters, Philippe. They weren't encoded until 4.1.
    In that case these character allocations seem perverse, given that both
    of these characters could have been assigned to the BMP, or both to
    outside it - or the reverse normalisation as suggested by Philippe.
    There is a serious danger of breaking existing implementations
    (especially those which only fully support the BMP) by introducing a BMP
    character which normalises to outside the BMP. For the BMP is now no
    longer a closed subset of Unicode, under operations like normalisation
    which existing implementations expected to find closed. Maybe someone
    thought this was a good idea, to force implementations to be upgraded,
    but it strikes me as a recipe for disaster. It could also be a serious
    security hole, as hackers try sending U+FACF to various implementations
    in an attempt to crash them.

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005

    This archive was generated by hypermail 2.1.5 : Sun Apr 03 2005 - 16:36:46 CST