Re: General Category of Latin subscript small letters

From: Ken Whistler (kenw@sybase.com)
Date: Mon Jan 31 2011 - 21:00:10 CST

  • Next message: Asmus Freytag: "Re: General Category of Latin subscript small letters"

    On 1/31/2011 12:41 PM, Asmus Freytag wrote:
    > I think that there's one good benefit to marking these characters as
    > Lm - it would further cement the notion that these are not styled
    > versions of the regular letters.
    >
    > Also, it would reduce the number of Ll characters that do not have a
    > case partner.
    >
    > Given the precedent cited by Ben Scarborough for the superscript
    > characters, this would further regularize the assignment of the GC.

    All true, but...

    >
    > A counter argument could be if some of these characters are never used
    > to "modify" another letter. If so, that fact and it's importance (and
    > therefore the importance of making the distinction in the gc) really
    > ought to be discussed in the block descriptions and/or annotated in
    > the character nameslist, it seems.

    That concern isn't relevant to these particular subscript characters,
    which were all
    encoded as modifier letters, but couldn't be *named* "MODIFIER LETTER
    XYZ" because
    of other consistency issues.

    >
    > As it stands, there's an apparent inconsistency with no apparent purpose.
    >
    > The best way to start on the path of a remedy for this situation would
    > be if you were to file a proposal to the UTC to make these changes.
    > That way, this can be discussed and resolved.

    Correct, but...

    >
    > Might as well add the list of Greek characters, submitted by Kent, for
    > the record, so they can be resolved as well. (By resolved I here mean
    > either have their GC changed or their documentation improved).

    I agree, but...

    Here is the problem:

    Changing these particular gc=Ll subscript modifier letters to gc=Lm
    impacts the derived property
    Lowercase. In order to keep the repertoire of Lowercase=True stable,
    they would then have to
    be *added* to the Other_Lowercase property. So an exception in one place
    will end up moving
    to an exception in another place. True, the resulting exceptionality of
    the exceptions is a bit
    more uniform, but the overall improvement may be marginal.

    But wait, there's more. These kinds of modifier letters are also Cased
    (see definition D135), by
    virtue of their being Lowercase. And they are not Case_Ignorable (see
    definition D136). Moving
    them from gc=Ll to gc=Lm would make them Cased (by virtue of their
    Lowercase value) and
    Case_Ignorable (because they are Lm). I know that is a bit of a
    head-bender, but that is how
    those properties are defined. Now, it may not actually matter that the
    derived Case_Ignorable
    property changes for these few subscript modifier letters, because
    Case_Ignorable is really
    a very narrow use property, just involved in the specification of the
    casing context for Greek final sigma.
    (See Table 3-15.) Nobody in the real world is going to notice or care
    that a few obscure UPA
    modifier letters could change a casing context for Greek final sigma,
    because nobody uses
    them together. But software test engineers don't live in the real world,
    and it is conceivable that
    test cases could break and somebody complain. Right now there is no
    provision for keeping
    Case_Ignorable stable for these kinds of one-off general category
    property changes -- presumably
    because for other than characters actually used with ordinary Greek
    letters, it doesn't really
    matter that much.

    But you have been warned. Tread carefully. ;-)

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Jan 31 2011 - 21:02:34 CST