Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Simon Josefsson (
Date: Wed Jan 26 2005 - 13:36:52 CST

  • Next message: Simon Josefsson: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"

    "Mark Davis" <> writes:

    > The problem is that the old D2 (referring to the PRI) didn't really provide
    > stability. Because it was not idempotent, if you normalized the same string
    > twice, you got a different answer than doing it once. That leads to weird
    > consistency issues. While it only happens in degenerate cases, it is
    > significant enough to make the change.

    However, by making the change, normalization over time become
    instable, and lead to similar consistency issues. If one application
    use Unicode 3.2 (or 4.0) and normalize the string, and another
    application use 4.1, you also get a different answer.

    One could argue that, because it only happen in degenerate cases, to
    preserve backwards compatibility and to guarantee idempotency across
    versions, could be worth adding a note saying that, for a few
    degenerative cases, normalization itself is not idempotent.

    > Second, what happens if a new version of stringprep updates to Unicode
    > 4.1.0? There are two paths: (a) it simply updates to the newer version, or
    > (b) it updates to the newer version AND adds another set of string mappings
    > that provides the old results for the cases that would differ. Note that
    > versions of stringprep can be bullet-proof with respect to many changes in
    > normalization, because it has a string mapping phase *before* strings go
    > into normalization. (However, our recommendation would be to update, since
    > it prevents problems because of idempotency.)

    Your recommendation would lead to problems with idempotency between
    the old and new versions of StringPrep, though.

    It is not clear to me that the problems caused by making the change
    outweigh the problems that exist if the change is not made.

    It seems to me that different applications could value the two set of
    problems differently.


    > ‚ÄéMark
    > ----- Original Message -----
    > From: "Simon Josefsson" <>
    > To: <>
    > Sent: Wednesday, January 26, 2005 02:39
    > Subject: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms
    >> This is a copy of my review comment on the open issue #61, posted here
    >> for wider distribution. (This is also a repost, because I was not
    >> subscribed the first time I tried to post. I am sorry if you end up
    >> receiving duplicates.)
    >> Rick McGowan <> writes:
    >> > Issue #61 Proposed Update UAX #15 Unicode Normalization Forms
    >> > A proposed update to UAX #15 for Unicode 4.1.0 is available at the link
    >> > above. The proposed changes are listed in the Modifications section of
    > the
    >> > document.
    >> Hello. Regarding the PR29 modification part of #61:
    >> This change appear to break backwards compatibility and normalization
    >> stability. The PR29 text suggest that the problematic sequences do
    >> not occur naturally. My question then is: why break normalization
    >> stability over something that doesn't appear to be a practical
    >> problem?
    >> Translating my question into a proposal:
    >> Keep the normative part of TR15 as-is, but fix the examples and
    >> introduction to match the normative text. Add a note on the NFC/NFKC
    >> idempotency, to say that idempotency is the goal, but that for a
    >> select few strings it does not hold and that normalization stability
    >> was considered more important than theoretical normalization
    >> idempotency.
    >> I am not convinced this proposal would be better than what you propose
    >> in the long run. However, I am concerned that normalization stability
    >> is given so little weight that it is violated even for situations that
    >> doesn't appear to have practical consequences.
    >> Thanks,
    >> Simon

    This archive was generated by hypermail 2.1.5 : Wed Jan 26 2005 - 13:39:28 CST