Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Jan 26 2005 - 12:33:23 CST

Next message: Hans Aberg: "Re: Surrogate points"

Previous message: Simon Josefsson: "Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
In reply to: Simon Josefsson: "Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Next in thread: Simon Josefsson: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Reply: Simon Josefsson: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The problem is that the old D2 (referring to the PRI) didn't really provide
stability. Because it was not idempotent, if you normalized the same string
twice, you got a different answer than doing it once. That leads to weird
consistency issues. While it only happens in degenerate cases, it is
significant enough to make the change.

It's a bit like having a collation standard that was not transitive, in
certain edge cases; it is failing a fundamental requirement for comparison.
While it would be possible to try to ignore those cases (since they would be
extremely rare), the repercussions when they do happen are important enough
to make the fix.

Note: With a corrigendum, we don't actually go back and change any version
of Unicode. Any implementation that claims conformance to 3.2, for example,
can stay precisely the same. Only if an implementation claims conformance to
3.2 *plus* the corrigendum would it change. So the current stringprep is not
affected.

Second, what happens if a new version of stringprep updates to Unicode
4.1.0? There are two paths: (a) it simply updates to the newer version, or
(b) it updates to the newer version AND adds another set of string mappings
that provides the old results for the cases that would differ. Note that
versions of stringprep can be bullet-proof with respect to many changes in
normalization, because it has a string mapping phase *before* strings go
into normalization. (However, our recommendation would be to update, since
it prevents problems because of idempotency.)

‎Mark

----- Original Message -----
From: "Simon Josefsson" <jas@extundo.com>
To: <unicode@unicode.org>
Sent: Wednesday, January 26, 2005 02:39
Subject: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

> This is a copy of my review comment on the open issue #61, posted here
> for wider distribution. (This is also a repost, because I was not
> subscribed the first time I tried to post. I am sorry if you end up
> receiving duplicates.)
>
> Rick McGowan <rick@unicode.org> writes:
>
> > Issue #61 Proposed Update UAX #15 Unicode Normalization Forms
> > A proposed update to UAX #15 for Unicode 4.1.0 is available at the link
> > above. The proposed changes are listed in the Modifications section of
the
> > document.
>
> Hello. Regarding the PR29 modification part of #61:
>
> This change appear to break backwards compatibility and normalization
> stability. The PR29 text suggest that the problematic sequences do
> not occur naturally. My question then is: why break normalization
> stability over something that doesn't appear to be a practical
> problem?
>
> Translating my question into a proposal:
>
> Keep the normative part of TR15 as-is, but fix the examples and
> introduction to match the normative text. Add a note on the NFC/NFKC
> idempotency, to say that idempotency is the goal, but that for a
> select few strings it does not hold and that normalization stability
> was considered more important than theoretical normalization
> idempotency.
>
> I am not convinced this proposal would be better than what you propose
> in the long run. However, I am concerned that normalization stability
> is given so little weight that it is violated even for situations that
> doesn't appear to have practical consequences.
>
> Thanks,
> Simon
>
>

Next message: Hans Aberg: "Re: Surrogate points"
Previous message: Simon Josefsson: "Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
In reply to: Simon Josefsson: "Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Next in thread: Simon Josefsson: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Reply: Simon Josefsson: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 26 2005 - 12:36:20 CST