L2/05-191 Date: August 2, 2005 Author: Ken Whistler Title: Proposal for dealing with lowercase Claudian letters The following email sent to the unicore list contains my analysis of what I consider the best way to handle the problems posed by the proposal to encode several Claudian letters. ------------- Begin Forwarded Message ------------- Date: Tue, 2 Aug 2005 15:14:36 -0700 (PDT) From: Kenneth Whistler Subject: Re: Casing stability and its implication >> I think it's worth exploring option two, which is to make the >> unification of the capital, but also add the lower case now >> to satisfy stability. O.k., let's explore option 2's implications a little further. >> >> For the turned captial F, that seems without question an >> appropriate thing, I have not found any other use for the >> existing character either. I concur. >> >> Whether the capital c is the correct version to match the >> Claudian use, I'm not as certain, but perhaps there's additional >> evidence. If that question can be settled in favor, then >> I'd much rather contemplate adding a lower case form now >> than adding a pair later. I see no graphological justification for disunification. It's just a turned capital C in either case. And like other Latin letters, it ends up with a usage in the Roman numeral system. The issue is more one of properties, since we ended up cloning off all those Roman numeral symbols for compatibility reasons, but added the few additions not in Asian character sets and gave them properties consistent with the compatibility symbols, rather than with Latin letters, i.e., all gc=Nl. At any rate, here is a restatement of Option 2, complete with property implications and required actions. (Code points are, of course, arbitrary for now.) Option 2a: Unification of capitals, with lowercase added sooner (Unicode 5.0) rather than later 2183 ROMAN NUMERAL REVERSED ONE HUNDRED = apostrophic C = Claudian antisigma * lowercase is A72D --> A72D LATIN SMALL LETTER REVERSED C --> 03FD GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL 2132 TURNED CAPITAL F = Claudian digamma inversum * lowercase is A731 --> A731 LATIN SMALL LETTER TURNED F --> 03DC GREEK LETTER DIGAMMA @+ Lowercase Claudian letters. Claudian letters in inscriptions are uppercase, but may be transcribed by scholars in lowercase. A72D LATIN SMALL LETTER REVERSED C = antisigma * uppercase is 2183 --> 2183 ROMAN NUMERAL REVERSED ONE HUNDRED --> 037B GREEK SMALL REVERSED LUNATE SIGMA SYMBOL A731 LATIN SMALL LETTER TURNED F = digamma inversum * uppercase is 2132 --> 2132 TURNED CAPITAL F --> 03DD GREEK SMALL LETTER DIGAMMA Complete property specification: 2183 : gc=Nl --> gc=Lu; add LC mapping to A72D 2132 : gc=So --> gc=Lu; add LC mapping to A731 A72D : gc=Ll; UC/TC mappings to 2183 A731 : gc=Ll; UC/TC mappings to 2132 The change for 2183 from gc=Nl --> gc=Lu has no bearing on Alphabetic -- the character is already Alphabetic. However, the change for 2132 adds it to Alphabetic. The changes for 2183 and 2132 add both to Uppercase. For 2132 this is unproblematical, because other letterlike symbols are Uppercase and have case pairs. For 2183 there is a consistency issue, because the apostrophic C is part of a set with 2180..2182, which are not Uppercase now (although notionally they should be, by form), and have no case mappings. I think the change would be benign, however, as nobody is really depending on casing assignments for 2180..2182. Another option would be to leave 2183 as gc=Nl and add it to Other_Uppercase instead, which would have the same effect without disturbing the General Category. That might be the preferable treatment, actually. Entries for 2183 and 2132 would appear in CaseFolding.txt as "C" common case folding entries, as of Unicode 5.0. The change for 2183 from gc=Nl --> gc=Lu has no bearing on identifiers. It is already included in all derived identifier properties by virtue of being gc=Nl. The change for 2132 moves it into identifiers, adding it to all derived identifier properties. I think that is o.k., because adding characters to default identifiers is o.k., as long as the character is not from the Pattern_Syntax ranges, which this is not. Both characters are already Grapheme_Base, so no implications there. I don't see any other property implications. The other properties for 2183 and 2132 stay unchanged, and the new characters are just handled as lowercase Latin letters. Required action: For this to work at all, it has to be accelerated into FPDAM2 (unless we want to risk the Unicode 5.0 schedule on foot of addition of two lowercase epigraphic Latin letters for scholarly transcriptions). That means acceptance by the UTC next week and instructions to our liaison to request them as additions to the ballot on the same grounds as the other lowercase additions requested in the U.S. ballot. Addition would be a little irregular, because outside of ballot comments, and I can see trouble there. But once the basis for the lowercase additions is explained to the group, the UTC liaison can point to the medievalist character proposal and say, oops, here are two more that fit the same criterion and which should be handled at the same time, to avoid implementation troubles or the need to encode duplicate characters in the future. --Ken