Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Simon Josefsson (jas@extundo.com)
Date: Wed Jan 26 2005 - 13:57:18 CST

Next message: Rick McGowan: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"

Previous message: Simon Josefsson: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
In reply to: Marcin 'Qrczak' Kowalczyk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Reply: Peter Kirk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl> writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> This change appear to break backwards compatibility and normalization
>> stability. The PR29 text suggest that the problematic sequences do
>> not occur naturally. My question then is: why break normalization
>> stability over something that doesn't appear to be a practical
>> problem?
>
> Because normalizations should be idempotent. This was always intended,
> the old specification had a bug.

I think there are two kinds of idempotency under discussion:

The first, "internal-idempotency", is that NFKC(NFKC(x)) = x.

The second, "version-idempotency", is that NFKC3.2(NFKC4.0(x)) = x.

The #61 proposal trade the second for the first.

If you look at TR15, section 3 Versioning and stability, the first
paragraph says
(<http://www.unicode.org/reports/tr15/tr15-24.html#Versioning>):

  It is crucial that normalization forms remain stable over time. That
  is, if a string that does not have any unassigned characters is
  normalized under one version of Unicode, it must remain normalized
  under all future versions of Unicode. This is the backwards
  compatibility requirement.

The requirement, the version-idempotency, appear to be violated, in
order to achieve the internal-idempotency.

Nowhere in the current document can I find any text that say that
internal-idempotency was a design goal or even a requirement. The #61
review issue mention these goals in an annex -- is that even part of
the normative text?

> It happens that it affected my implementation of normalization that
> I've made for my language. I already fixed it. Are you saying that I
> should break it again?

What are you using normalization for? If it is for StringPrep,
including internationalized domain names, you should revert your fix
because StringPrep use 3.2 without the proposed update.

>> However, I am concerned that normalization stability is given so
>> little weight that it is violated even for situations that doesn't
>> appear to have practical consequences.
>
> I am more concerned with maintaining bugs forever in the name of
> stability.

Right, it is a trade-off. If you care more about internal-idempotency
than version-idempotency, I understand.

> If this particular change can have practical consequence, it's more
> probable that something will break with the old definition (because
> a subsystem relied on idempotency) than with the new one.

This is a conclusion that I have failed to reach.

Several IETF protocols are being modified to use StringPrep today,
which use the old normalization. When/if StringPrep is updated to use
the new normalization, those protocols appear to be faced with an
upgrade problem.

I have not seen enough discussion about this problem in public to make
me comfortable about this change. If there was a plan on handling the
upgrade-problem, I would be more comfortable.

Thanks.

Next message: Rick McGowan: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Previous message: Simon Josefsson: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
In reply to: Marcin 'Qrczak' Kowalczyk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Reply: Peter Kirk: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 26 2005 - 13:59:40 CST