L2/05-211

Source: Mark Davis
Date: August 5, 2005
Subject: Suggestions for UAX #15

Ken had the following suggestions for UAX #15, that came up in discussion of
the stability policies. I appended some email that I had sent to someone in
the IETF on normalization that may also be useful material.

-----
From Ken:

Perhaps the stability section of UAX #15 should be
elaborated in the next version to make it pedantically
clear:

A. What is being guaranteed into the future.

   (staying normalized, but not necessarily normalizing
   to the *same* string)

B. What is true back to the applicable version *if*
   the corrigenda are applied.

   (normalizing to the *same* string)

C. What is true back to the applicable version if
   the corrigenda are not applied.

   (staying normalized, but not necessarily normalizing
   to the *same* string, and in some [confused] implementations
   having to normalize twice to get a stable result)

D. How to make use of NormalizationCorrections.txt
   between any two versions back to the applicable
   version so as to guarantee normalizing to the *same* string.
   I.e., guaranteeing the desired output of B in instances
   where the client deliberately is *not* applying the
   corrigenda.

-----
From Mark:

... And the very few data changes we made early on were all verified to be
ones
that can be accounted for in a simple addition to the mapping table in
NamePrep, for you to maintain backward compatibility. Thus we have always
maintained the invariants:

1. Any changes are guaranteed not to disturb the stability of previous
*normalized* strings. That is, if on system A, normalize(X) = X', then on
system
B, normalize(X') = X'. (This is given that the characters are all defined in
the
version of Unicode used on A and B.) Thus once characters are normalized,
they
stay normalized.

2. All normalization corrections can be implemented -- or avoided -- by the
strinprep mapping (Section 3 of [StringPrep]). That is, suppose that on
Unicode
3.2, X normalizes to X', but on Unicode 4.1, X normalizes to X".

Because of #1 above:

A. To simulate Unicode 4.1 action on a 3.2 system, one merely adds a line to
the
StringPrep mapping:
X => X"

B. To simulate Unicode 3.2 action on a 4.1 system, one merely adds a line to
the
StringPrep mapping:
X => X'

Because all normalization changes are guaranteed to leave X" and X' alone,
this
works, and involves no architectural changes to StringPrep; only small
additions
to the mapping tables.