Re: Collation charts out of date

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Jan 30 2004 - 18:22:32 EST

  • Next message: Michael Everson: "New Contribution N2698"

    On 30/01/2004 13:58, Philippe Verdy wrote:

    > ...
    >
    >I also agree that small capitals have a tertiary or quaternary differences,
    >but it's not clear if they are a variant of lowercase when used as a font
    >style for all letters, or of uppercase letters.
    >
    >
    >
    Good question. I was assuming of uppercase, but maybe not.

    >So I bet they should have the same distinction as between lowercase and
    >uppercase, so that case-insensitive collation (which ignores secondary
    >differences) will work correctly even if tertiary and quaternary difference
    >are kept to sort accents and other minor variants, probably by sorting small
    >capitals between uppercase and lowercase letters at the same collation
    >weight.
    >
    >
    >
    Philippe, I would agree with you that this is a sensible thing to do.
    But, as Ken has said, it does cause some difficulties. Presumably this
    is because there is no defined tertiary weight for small caps in
    http://www.unicode.org/reports/tr10/#Tertiary_Weight_Table, and in
    software which implements this. I don't see why the currently unused and
    suitably positioned tertiary weight 0x0007 cannot simply be allocated to
    small caps. But this would mean changing a Unicode Technical Standard,
    and software which implements it. I would tend to agree with Ken that it
    is not worth doing this for just a few very rarely used characters.
    There are defined weights for <small>, but these seem to be for hiragana
    and katakana only.

    On the other hand, it would be possible to assign to these characters
    individually the tertiary weight 0x0007, or whatever other value might
    be suitable. The work needed to do this once for all for about 33
    characters (in the IPA Extensions and Phonetic Extensions blocks) is
    hardly burdensome.

    >The simplest way to allow tailoring is to map small capitals as secondary
    >differences between uppercase and lowercase letters, and let a tailoring
    >remap them in one simple operation to quaternary level, where a 1-weight gap
    >has been left in quaternary collation weights to allow such remapping
    >without needing shifts of weights.
    >
    >
    >
    >
    Surely quaternary weights are by default code point values, and so
    automatically have the required gaps. This tailoring can of course be
    done, but I am not sure why it might be necessary.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Fri Jan 30 2004 - 19:30:25 EST