Re: Collation charts out of date

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jan 30 2004 - 12:44:13 EST

  • Next message: Rick McGowan: "Public Review Issue #20 updated"

    Peter Kirk asked:

    > It does look very odd that 1D28 has been separated from the other pi's,
    > 1D29 from the other rho's etc. Is there a good reason for that? I know
    > everyone hates the UPA (except for Uralicists presumably), but these
    > letters are still clearly variants of pi and rho. The same applies to
    > the Latin small caps of course - why are they collated separately at the
    > first level when all other font variants are not?

    The reason for this is that these are *small capital* variants.
    Small capitals were never given compatibility decomposition mappings
    in UnicodeData.txt. Thus, because compatibility decomposition
    mappings are used for the first, automated cut at tertiary
    weighting distinctions, small capitals don't get autoweighted
    as tertiary variants. Instead, the input file is generated in
    such a way that they get primary weights right after the group
    of characters associated with the primary weight of the base
    character.

    If you look further in the collation charts outside of Greek, you
    will find that this is done consistently this way for the Latin
    letters. So "fixing" it for the few Greek small capitals from
    UPA would introduce an inconsistency between Greek and Latin
    weighting. Also, "fixing" it would be a non-trivial task, since
    it would either require introducing another distinct tertiary
    weight into the table or would require treating "small capital"
    as a secondary weight distinction. The latter would be easier
    to implement, but then would lead to arguments among the
    perfectionists as to why "small capital" should be a secondary
    weight distinction when capital versus small is a tertiary
    weight distinction. And so on and so on...

    In any case, these small capitals are very, very unlikely to
    count much in sorting of any real corpus of data, and even if
    they do, the mechanism of tailoring is always available for
    people to tweak the table into exactly the behavior they
    prefer.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jan 30 2004 - 13:47:03 EST