L2/06-067 Proposal to move some casing data from the UCD to the CLDR Eric Muller Adobe Systems Inc February 8, 2006 Looking at the case transformation data expressed in the UCD, we find: - for the vast majority of the cased characters, the mappings to their uppercase, lowercase, titlecase, and case-folded forms are to a single character and are unconditional - a small number of cased characters have one or more of the mappings to sequences of characters rather than to a single character; those mappings are still unconditional - U+03A3 GREEK CAPITAL LETTER SIGMA has a conditional treatment, recorded as a single entry 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA - some mappings for orthographies which distinguish the dotted and dotless i and j; those conditional mappings involve at least a locale The conditional mappings, which concern a very small number of situations, make the description of the casing properties rather complicated. If one views the lower case mapping as a code point (or scalar value, or abstract character, or coded character) property, then the value of that property is a mapping from a condition to a sequence of code points. This is in sharp contrast with all the other (non-Unihan) properties, which have simple values. In fact, this complexity is not properly accounted for in the standard. The Simple_Lowercase_Mapping property and its companions are informally listed as "String properties" in PropertyAliases.txt. There is also documentation of a Special_Case_Condition property as a "String property", which does not make sense. Furthermore, the tendancy in the Unicode Standard is to describe only locale-independant data and processes (may be allowing for tailoring), and to leave the handling of locale specific situations to other mechanisms, such as the CLDR. Furthermore, it is arguable that the final sigma behavior is actually conditional to locales based on the Greek language, rather than inherent to the Greek script. Given all this, I propose that: - the conditional case mappings be treated as "tailorings" of the case conversion operations, and that those tailorings be recorded in the CLDR rather than the UCD; - the property Special_Case_Condition be deprecated ---