Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Jan 26 2005 - 12:33:23 CST

  • Next message: Hans Aberg: "Re: Surrogate points"

    The problem is that the old D2 (referring to the PRI) didn't really provide
    stability. Because it was not idempotent, if you normalized the same string
    twice, you got a different answer than doing it once. That leads to weird
    consistency issues. While it only happens in degenerate cases, it is
    significant enough to make the change.

    It's a bit like having a collation standard that was not transitive, in
    certain edge cases; it is failing a fundamental requirement for comparison.
    While it would be possible to try to ignore those cases (since they would be
    extremely rare), the repercussions when they do happen are important enough
    to make the fix.

    Note: With a corrigendum, we don't actually go back and change any version
    of Unicode. Any implementation that claims conformance to 3.2, for example,
    can stay precisely the same. Only if an implementation claims conformance to
    3.2 *plus* the corrigendum would it change. So the current stringprep is not
    affected.

    Second, what happens if a new version of stringprep updates to Unicode
    4.1.0? There are two paths: (a) it simply updates to the newer version, or
    (b) it updates to the newer version AND adds another set of string mappings
    that provides the old results for the cases that would differ. Note that
    versions of stringprep can be bullet-proof with respect to many changes in
    normalization, because it has a string mapping phase *before* strings go
    into normalization. (However, our recommendation would be to update, since
    it prevents problems because of idempotency.)

    ‎Mark

    ----- Original Message -----
    From: "Simon Josefsson" <jas@extundo.com>
    To: <unicode@unicode.org>
    Sent: Wednesday, January 26, 2005 02:39
    Subject: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

    > This is a copy of my review comment on the open issue #61, posted here
    > for wider distribution. (This is also a repost, because I was not
    > subscribed the first time I tried to post. I am sorry if you end up
    > receiving duplicates.)
    >
    > Rick McGowan <rick@unicode.org> writes:
    >
    > > Issue #61 Proposed Update UAX #15 Unicode Normalization Forms
    > > A proposed update to UAX #15 for Unicode 4.1.0 is available at the link
    > > above. The proposed changes are listed in the Modifications section of
    the
    > > document.
    >
    > Hello. Regarding the PR29 modification part of #61:
    >
    > This change appear to break backwards compatibility and normalization
    > stability. The PR29 text suggest that the problematic sequences do
    > not occur naturally. My question then is: why break normalization
    > stability over something that doesn't appear to be a practical
    > problem?
    >
    > Translating my question into a proposal:
    >
    > Keep the normative part of TR15 as-is, but fix the examples and
    > introduction to match the normative text. Add a note on the NFC/NFKC
    > idempotency, to say that idempotency is the goal, but that for a
    > select few strings it does not hold and that normalization stability
    > was considered more important than theoretical normalization
    > idempotency.
    >
    > I am not convinced this proposal would be better than what you propose
    > in the long run. However, I am concerned that normalization stability
    > is given so little weight that it is violated even for situations that
    > doesn't appear to have practical consequences.
    >
    > Thanks,
    > Simon
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Jan 26 2005 - 12:36:20 CST