RE: Adding Lowercase Letters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 09 2007 - 15:15:37 CDT

  • Next message: Michael Everson: "Re: Uppercase is coming? (U+1E9E)"

    Richard Wordingham
    >
    > Kenneth Whistler wrote on Tuesday, May 08, 2007 at 3:00 AM
    > > I assume you are talking about the discussions of casefolding
    > > stability, which now specify that if there is an existing
    > > *uppercase* letter in the standard but no lowercase for it,
    > > that a lowercase paired letter cannot be added later, as
    > > casefolding stability would prevent adding a tolowercase() mapping
    > > for it, and failing that, the expectations about the case
    > > relation would not be met.
    >
    > Where is this specified?
    >
    > The relevant rules seem to be that:
    >
    > (A) toCasefold(NFKC(S)) is constant over time once all characters in S are
    > assigned.
    > (B) From each [equivalence] class [defined for case folding], one
    > representative element (a single lowercase letter where possible) is
    > chosen
    > to be the common form. (TUS 5.0 p188 R5)
    >
    > I am not sure if the definition in TUS of case folding is mandatory -
    > dotless 'i' and dotted 'I' are described as 'an exception', as though
    > there might be other exceptions.
    >
    > However, if condition (A) is taken as influencing the application of
    > condition (B), then as I read it, one could add an uppercase letter and
    > then its lowercase form in a subsequent version, but they would then
    > have tocasefold to the uppercase letter. That seems better than not
    > meeting the expectation that Unicode will eventually support the normal
    > writing system of one's community.
    >
    > I can't deduce the prohibition on changing tolowercase().

    This prohibition comes from the stability rules for the normative character
    properties in the UCD.

    But this may be solved in special casing rules if needed for some locales.
    For now, the special casing rules for Turkic languages are not making the
    standard more defective (in fact, the common assumptions about stability of
    case folding is respected with the Turkic languages, when it is not in the
    default locale derived from the character-based UCD)

    I now wonder if the default casing rules are only appropriate for the
    C/POSIX locale (which uses simplified text processing at the character-level
    only instead of the complete text level), and if another consistant rules
    should be defined explicitly for the language-neutral (root) locale that
    removes the inconsistencies of case folding (by making dotted and undotted
    I's mapped to the same case-folded letter).



    This archive was generated by hypermail 2.1.5 : Wed May 09 2007 - 15:18:03 CDT