Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Simon Josefsson (jas@extundo.com)
Date: Thu Jan 27 2005 - 15:37:03 CST

  • Next message: Jon Hanna: "RE: Surrogate points"

    Markus Scherer <markus.scherer@jtcsv.com> writes:

    > 1. Aside from broken idempotency, this interpretation of the old UAX version "normalizes" such text
    > to something that is *not canonically equivalent* to the input - it changes some text to some
    > completely different text.
    >
    > 2. There also exist strings (see PRI 29) where the application of NFC[old UAX] or NFKC[old UAX]
    > produces output that is not only different text (not canonically equivalent) but also *not in
    > canonical order*. As a result, something you got from normalization may not even pass the
    > normalization quick check: NFC_quick_check(NFC(string))=NO.

    Thanks. I think your points are the best arguments I've seen yet, as
    to why the proposed fix is better than keeping the old normative text.

    However, if I understand correctly, still only the corner case
    "problem sequences" would be affected by these problems, so these
    points are not strongly convincing arguments to me.

    It still appear possible that StringPrep/IDN, and in general all
    Unicode applications that rely on the explicitly stated
    version-idempotency property, could handle this problem smoother if
    these normalization properties were permitted to be in a sub-optimal
    state for these rare problem sequences.

    What remains for me to understand is that no other alternative can
    lead to a better overall situation.

    Regards,
    Simon



    This archive was generated by hypermail 2.1.5 : Thu Jan 27 2005 - 15:38:26 CST