UTC/2000-007 Posted on unicore 2000-01-11 From: Mark Davis Re: Clarification of "cased" in UTR21. Based on the following email, I propose a change to the definition of "cased" in UTR21. > To: Mark Davis/Cupertino/IBM@IBMUS > cc: > Subject: questions regarding Special Casing > Hello, > Thanks for your contributions to spelling out Unicode for us > developers and users. > I'm working with a group that's developing linguistic tools. > One of our goals is to > comply with the Unicode 3.0 standard, including its > specifications for character > properties and case mappings. In reading your UTR #21 (Case > Mappings -- > revision 3.0 11/03/1999) there were a couple of points that > were unclear to me. > > Under section 2, "Guidelines", the bullets say, > In all of the guidelines given below ... Treat 0345 > "combining iota subscript" as a lowercase letter. > Currently in the Unicode data file UnicodeData.txt (v 3.0), > character 0345's general category is "Mn" > (mark, non-spacing). Is your guideline here a correction, i.e. > should 0345's general category be changed to > "Ll"? No, what that means is that while for general purposes 0345 is correctly characterized as Mn, for the purposes of case mappings *in the following discussion* it should be handled differently. > Another bullet in that list says > A character is _cased_ if it is marked as uppercase, > lowercase, or titlecase (Lu, Ll, Lt). > If this definition is complete, then is character 0345 > considered cased? In a similar vein, are > characters that have explicit case mappings considered cased, > even if they are not "letters"? > E.g. > 24B6;CIRCLED LATIN CAPITAL LETTER A;So;0;L; > 0041;;;;N;;;;24D0; > 2160;ROMAN NUMERAL ONE;Nl;0;L; 0049;;;1;N;;;;2170; This is a good point. For non-letters, it is a matter of trying to match user expectations. Suppose that a user selected a paragraph of text and lowercased it using a menu command. Would s/he expect to see roman numerials and circled letters lowercased? I suspect so. > > In practical terms, if a string contains U+24B6 and no > lowercase characters, > should it be considered an uppercase string? If this string is > converted to > lowercase, should the 24B6 be converted to 24D0? > > It would perhaps be helpful to mention in your document the > existence of non-letters that have case mappings, > and clarify what the correct treatment of them would be > according to the standard. Agreed. The document should probably specify _cased_ to include non-letters that have case mappings. I will bring this up at the next Unicode Technical Committee meeting. Page 1 Document2