L2/05-191

Date: August 2, 2005
Author: Ken Whistler
Title: Proposal for dealing with lowercase Claudian letters


The following email sent to the unicore list contains my
analysis of what I consider the best way to handle the
problems posed by the proposal to encode several Claudian
letters.

------------- Begin Forwarded Message -------------

Date: Tue, 2 Aug 2005 15:14:36 -0700 (PDT)
From: Kenneth Whistler
Subject: Re: Casing stability and its implication


>> I think it's worth exploring option two, which is to make the
>> unification of the capital, but also add the lower case now
>> to satisfy stability.


O.k., let's explore option 2's implications a little further.


>> 
>> For the turned captial F, that seems without question an
>> appropriate thing, I have not found any other use for the
>> existing character either.


I concur.


>> 
>> Whether the capital c is the correct version to match the
>> Claudian use, I'm not as certain, but perhaps there's additional
>> evidence. If that question can be settled in favor, then
>> I'd much rather contemplate adding a lower case form now
>> than adding a pair later.


I see no graphological justification for disunification. It's
just a turned capital C in either case. And like other Latin
letters, it ends up with a usage in the Roman numeral
system. The issue is more one of properties, since we ended
up cloning off all those Roman numeral symbols for compatibility
reasons, but added the few additions not in Asian character
sets and gave them properties consistent with the compatibility
symbols, rather than with Latin letters, i.e., all gc=Nl.

At any rate, here is a restatement of Option 2, complete
with property implications and required actions. (Code points
are, of course, arbitrary for now.)

Option 2a: Unification of capitals, with lowercase added
     sooner (Unicode 5.0) rather than later
     
2183 ROMAN NUMERAL REVERSED ONE HUNDRED
	= apostrophic C
	= Claudian antisigma
	* lowercase is A72D
	--> A72D LATIN SMALL LETTER REVERSED C
	--> 03FD GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL
	
2132 TURNED CAPITAL F
	= Claudian digamma inversum
	* lowercase is A731
	--> A731 LATIN SMALL LETTER TURNED F
	--> 03DC GREEK LETTER DIGAMMA

@+	Lowercase Claudian letters. Claudian letters in inscriptions are 
uppercase,
	but may be transcribed by scholars in lowercase.
                   
A72D LATIN SMALL LETTER REVERSED C
	= antisigma
	* uppercase is 2183
	--> 2183 ROMAN NUMERAL REVERSED ONE HUNDRED
	--> 037B GREEK SMALL REVERSED LUNATE SIGMA SYMBOL
	
A731 LATIN SMALL LETTER TURNED F
	= digamma inversum
	* uppercase is 2132
	--> 2132 TURNED CAPITAL F
	--> 03DD GREEK SMALL LETTER DIGAMMA
	
Complete property specification:

2183 : gc=Nl --> gc=Lu; add LC mapping to A72D
2132 : gc=So --> gc=Lu; add LC mapping to A731

A72D : gc=Ll; UC/TC mappings to 2183
A731 : gc=Ll; UC/TC mappings to 2132

The change for 2183 from gc=Nl --> gc=Lu has no bearing on Alphabetic -- the
character is already Alphabetic. However, the change for 2132 adds it
to Alphabetic.

The changes for 2183 and 2132 add both to Uppercase. For 2132 this
is unproblematical, because other letterlike symbols are Uppercase
and have case pairs. For 2183 there is a consistency issue, because
the apostrophic C is part of a set with 2180..2182, which are not
Uppercase now (although notionally they should be, by form), and have
no case mappings. I think the change would be benign, however, as
nobody is really depending on casing assignments for 2180..2182.
Another option would be to leave 2183 as gc=Nl and add it to
Other_Uppercase instead, which would have the same effect without
disturbing the General Category. That might be the preferable treatment,
actually.

Entries for 2183 and 2132 would appear in CaseFolding.txt as "C" common
case folding entries, as of Unicode 5.0.

The change for 2183 from gc=Nl --> gc=Lu has no bearing on identifiers.
It is already included in all derived identifier properties by virtue
of being gc=Nl. The change for 2132 moves it into identifiers, adding
it to all derived identifier properties. I think that is o.k., because
adding characters to default identifiers is o.k., as long as the
character is not from the Pattern_Syntax ranges, which this is not.

Both characters are already Grapheme_Base, so no implications there.

I don't see any other property implications. The other properties for
2183 and 2132 stay unchanged, and the new characters are just handled
as lowercase Latin letters.

Required action: For this to work at all, it has to be accelerated into
FPDAM2 (unless we want to risk the Unicode 5.0 schedule on foot of
addition of two lowercase epigraphic Latin letters for scholarly
transcriptions). 

That means acceptance by the UTC next week and instructions to our liaison to
request them as additions to the ballot on the same grounds as the
other lowercase additions requested in the U.S. ballot. Addition would
be a little irregular, because outside of ballot comments, and I can
see trouble there. But once the basis for the lowercase additions is
explained to the group, the UTC liaison can point to the medievalist
character proposal and say, oops, here are two more that fit the same
criterion and which should be handled at the same time, to avoid
implementation troubles or the need to encode duplicate characters
in the future.

--Ken