L2/09-248 Subject: Issues With the new casing related properties in the beta (DerivedCorePorperties.txt) From: Asmus Freytag with input from Ken Whistler and Mark Davis Date: July 13, 2009 Replaces: L2/09-238 For consideration by UTC This is a proposal to rename several character properties introduced in 5.2.0 beta and to amend the names of string properties in TUS. This is a refinement of a proposal in document 09/238 1. Fix naming of several string properties Instead of the existing names for string properties defined in chapter 3 isLowerCase() isUpperCase() isTitleCase() isCasefolded() use these names isLowerCaseString() isUpperCaseString() isTitleCaseString() isCaseFoldedString() This change makes obvious the nature of these properties as *string* properties. This is important, because once outside the context of the specific section in chapter 3 of the Standard, users are likely to be confused whether strings or characters are intended. API names, such as "isLowerCase" commonly exist in implementations and they are not limited to string properties. 2. Fix naming of several new character properties Instead of the Boolean character properties in DerivedCoreProperties.txt IsLowerCase IsUpperCase IsTitleCase IsCasefolded and listing only those values for which these properties are FASLE as is done in the BETA, use the corresponding properties Changes_When_Lowercased Changes_When_Uppercased Changes_When_Titlecased Changes_When_Casefolded and list the values for which they are TRUE. The advantage of using these names is that they captures the selection criteria succinctly, making them readily and unambiguously understandable, even without immediate access to the definitions in chapter 3. It is also directly apparent how these properties work as *character* properties. They are defined in the sense of "Changes_when" to keep the listing small while sticking to the convention of listing only "true" values for Booleans in the data files. (The actual ranges of listed values would not change) If these changes are adopted, they work out as follows for these exmaples: For "M", Lowercase=F, Uppercase=T, Changes_When_Lowercased=T, Changes_When_Uppercased=F Changes_When_TitleCased=F For "m", Lowercase=T, Uppercase=F, Changes_When_Lowercased=F, Changes_When_Uppercased=T Changes_When_TitleCased=T For "lj" Lowercase=T, Uppercase=F, Changes_When_Lowercased=F, Changes_When_Uppercased=T Changes_When_TitleCased=T For "2", Lowercase=F, Uppercase=F, Changes_When_Lowercased=F, Changes_When_Uppercased=F Changes_When_TitleCased=F For "Lj", Lowercase=T, Uppercase=F, GC=TitleCase_Letter Changes_When_Lowercased=F, Changes_When_Uppercased=T Changes_When_TitleCased=F and so on, where Lowercase and Uppercase are existing character properties. The relation to the string properties is straightforward and without surprises, even if the naming doesn't correspond as closely as it did in the BETA. This relations can be stated as follows: Changes_When_Lowercased ==> this character cannot occur in lowercase strings Changes_When_Uppercased ==> this character cannot occur in uppercase strings Changes_When_Titlecased ==> this character cannot occur in titlecase strings (initial position) Changes_When_Casefolded ==> this character cannot occur in casefolded strings These relations should be added as comments resp. annotations to the data files and chapter 3. The following examples of strings and string properties in question make that clear: 1. isUppercaseString: "MARK DAVIS?", "LJO LE", "2 BE OR NOT 2 BE" 2. isTitlecaseString: "Mark Davis?", "Ljo Le", "2 be or not 2 be" 3. isLowercaseString: "mark davis?", "ljo le", "2 be or not 2 be" Note that the Unicode definition of an uppercase string consists of uppercase characters and characters that don't change case like punctuation and characters from non-cased scripts like Han, Hangul and Hieroglyphs. The "?" and "2" are standins for these characters in the examples. 3. Places where these changes need to be propagated to Note that in addition to changes to the DerivedCoreProperties.txt file, this proposal also impact the Chapter 3 text on default case operation, the XML generation, as well as the text of UAX #44 text and UAX #42. [end]