Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Simon Josefsson (jas@extundo.com)
Date: Wed Jan 26 2005 - 13:57:18 CST

  • Next message: Rick McGowan: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"

    "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl> writes:

    > Simon Josefsson <jas@extundo.com> writes:
    >
    >> This change appear to break backwards compatibility and normalization
    >> stability. The PR29 text suggest that the problematic sequences do
    >> not occur naturally. My question then is: why break normalization
    >> stability over something that doesn't appear to be a practical
    >> problem?
    >
    > Because normalizations should be idempotent. This was always intended,
    > the old specification had a bug.

    I think there are two kinds of idempotency under discussion:

    The first, "internal-idempotency", is that NFKC(NFKC(x)) = x.

    The second, "version-idempotency", is that NFKC3.2(NFKC4.0(x)) = x.

    The #61 proposal trade the second for the first.

    If you look at TR15, section 3 Versioning and stability, the first
    paragraph says
    (<http://www.unicode.org/reports/tr15/tr15-24.html#Versioning>):

      It is crucial that normalization forms remain stable over time. That
      is, if a string that does not have any unassigned characters is
      normalized under one version of Unicode, it must remain normalized
      under all future versions of Unicode. This is the backwards
      compatibility requirement.

    The requirement, the version-idempotency, appear to be violated, in
    order to achieve the internal-idempotency.

    Nowhere in the current document can I find any text that say that
    internal-idempotency was a design goal or even a requirement. The #61
    review issue mention these goals in an annex -- is that even part of
    the normative text?

    > It happens that it affected my implementation of normalization that
    > I've made for my language. I already fixed it. Are you saying that I
    > should break it again?

    What are you using normalization for? If it is for StringPrep,
    including internationalized domain names, you should revert your fix
    because StringPrep use 3.2 without the proposed update.

    >> However, I am concerned that normalization stability is given so
    >> little weight that it is violated even for situations that doesn't
    >> appear to have practical consequences.
    >
    > I am more concerned with maintaining bugs forever in the name of
    > stability.

    Right, it is a trade-off. If you care more about internal-idempotency
    than version-idempotency, I understand.

    > If this particular change can have practical consequence, it's more
    > probable that something will break with the old definition (because
    > a subsystem relied on idempotency) than with the new one.

    This is a conclusion that I have failed to reach.

    Several IETF protocols are being modified to use StringPrep today,
    which use the old normalization. When/if StringPrep is updated to use
    the new normalization, those protocols appear to be faced with an
    upgrade problem.

    I have not seen enough discussion about this problem in public to make
    me comfortable about this change. If there was a plan on handling the
    upgrade-problem, I would be more comfortable.

    Thanks.



    This archive was generated by hypermail 2.1.5 : Wed Jan 26 2005 - 13:59:40 CST