From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 09 2007 - 15:15:37 CDT
Richard Wordingham
>
> Kenneth Whistler wrote on Tuesday, May 08, 2007 at 3:00 AM
> > I assume you are talking about the discussions of casefolding
> > stability, which now specify that if there is an existing
> > *uppercase* letter in the standard but no lowercase for it,
> > that a lowercase paired letter cannot be added later, as
> > casefolding stability would prevent adding a tolowercase() mapping
> > for it, and failing that, the expectations about the case
> > relation would not be met.
>
> Where is this specified?
>
> The relevant rules seem to be that:
>
> (A) toCasefold(NFKC(S)) is constant over time once all characters in S are
> assigned.
> (B) From each [equivalence] class [defined for case folding], one
> representative element (a single lowercase letter where possible) is
> chosen
> to be the common form. (TUS 5.0 p188 R5)
>
> I am not sure if the definition in TUS of case folding is mandatory -
> dotless 'i' and dotted 'I' are described as 'an exception', as though
> there might be other exceptions.
>
> However, if condition (A) is taken as influencing the application of
> condition (B), then as I read it, one could add an uppercase letter and
> then its lowercase form in a subsequent version, but they would then
> have tocasefold to the uppercase letter. That seems better than not
> meeting the expectation that Unicode will eventually support the normal
> writing system of one's community.
>
> I can't deduce the prohibition on changing tolowercase().
This prohibition comes from the stability rules for the normative character
properties in the UCD.
But this may be solved in special casing rules if needed for some locales.
For now, the special casing rules for Turkic languages are not making the
standard more defective (in fact, the common assumptions about stability of
case folding is respected with the Turkic languages, when it is not in the
default locale derived from the character-based UCD)
I now wonder if the default casing rules are only appropriate for the
C/POSIX locale (which uses simplified text processing at the character-level
only instead of the complete text level), and if another consistant rules
should be defined explicitly for the language-neutral (root) locale that
removes the inconsistencies of case folding (by making dotted and undotted
I's mapped to the same case-folded letter).
This archive was generated by hypermail 2.1.5 : Wed May 09 2007 - 15:18:03 CDT