From: Simon Josefsson (jas@extundo.com)
Date: Thu Jan 27 2005 - 15:37:03 CST
Markus Scherer <markus.scherer@jtcsv.com> writes:
> 1. Aside from broken idempotency, this interpretation of the old UAX version "normalizes" such text
> to something that is *not canonically equivalent* to the input - it changes some text to some
> completely different text.
>
> 2. There also exist strings (see PRI 29) where the application of NFC[old UAX] or NFKC[old UAX]
> produces output that is not only different text (not canonically equivalent) but also *not in
> canonical order*. As a result, something you got from normalization may not even pass the
> normalization quick check: NFC_quick_check(NFC(string))=NO.
Thanks. I think your points are the best arguments I've seen yet, as
to why the proposed fix is better than keeping the old normative text.
However, if I understand correctly, still only the corner case
"problem sequences" would be affected by these problems, so these
points are not strongly convincing arguments to me.
It still appear possible that StringPrep/IDN, and in general all
Unicode applications that rely on the explicitly stated
version-idempotency property, could handle this problem smoother if
these normalization properties were permitted to be in a sub-optimal
state for these rare problem sequences.
What remains for me to understand is that no other alternative can
lead to a better overall situation.
Regards,
Simon
This archive was generated by hypermail 2.1.5 : Thu Jan 27 2005 - 15:38:26 CST