L2/13-183
From: Mark Davis
To: UTC
Re: Addressing process failure
Date: 2013-10-06

There was a problem in the data that required us to back out a change in Unicode 6.3. We had added characters to Uppercase, but not to Alphabetic. That breaks a stability constraint.

* All characters with the Lowercase property and all characters with the Uppercase property have the Alphabetic property .

This document is about addressing the point-of-failures.

1. Fixing Derivation.

I tracked down the technical point of failure with Alphabetic.

We have the following definitions:

# Derived Property: Uppercase
#  Generated from: Lu + Other_Uppercase

# Derived Property: Lowercase
#  Generated from: Ll + Other_Lowercase

# Derived Property: Alphabetic
#  Generated from: Lu+Ll+Lt+Lm+Lo+Nl + Other_Alphabetic

Yet we require that Alphabetic ⊇ Uppercase and Alphabetic ⊇ Lowercase. Therefore, I'm planning to propose that the UTC change the derivation to:

# Derived Property: Alphabetic
#  Generated from: Uppercase+Lowercase+Lt+Lm+Lo+Nl + Other_Alphabetic

That solves a general problem for the future.

2. Clarifying Casing Stability.

There is the separate issue about casing pairs for the particular characters that exposed the problem, and that the UTC needs to consider. Looking at http://www.unicode.org/policies/stability_policy.html#Case_Pair, all we guarantee is that for existing characters, case pairs cannot be broken or formed. That principle does not prevent us from adding new characters, and forming case pairs with them. However, a careful look at the previous principle (http://www.unicode.org/policies/stability_policy.html#Case_Folding) shows that that can only happen for (a) a pair of characters that are both new, or for (b) where the uppercase version is new, not the lowercase. 

Editorially, I think it would be clearer if we made that very clear on the stability page, by changing:

A character that is not part of a case pair could become part of one if the new case pair is formed at the time of the addition of a new character to Unicode. For example, a new capital version of U+028D ( ʍ ) LATIN SMALL LETTER TURNED W could be added in the future to form a new case pair.

to

An existing lowercase character that is not part of a case pair could become part of one if the new case pair is formed at the time of the addition of a new uppercase character to Unicode. For example, a new capital version of U+028D ( ʍ ) LATIN SMALL LETTER TURNED W could be added in the future to form a new case pair. Case Folding Stability disallows the addition of a new lowercase mapping, thus an existing uppercase character cannot become part of a case pair with the addition of a new lowercase character.

I'd propose we make this editorial change.

3. Fixing Invariant Test.

The problem was caught by my invariant tests. However, it was caught very late in the process. From a process standpoint, early in the BRS process we should make sure to run the invariant tests, and review and patch the test until it passes. That would have caught this kind of problem much more quickly. (I don't think those tests are taken quite as seriously as they could be (nobody but me has reviewed them, for example). However, they are handy for catching problems!) For #3, I do not currently have an invariant test that checks for the introduction of a new casing pair that would be disallowed by #3. I suggest that I be given an action to add one.