Re: collation of small capitals (was: Collation charts out of date)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Jan 31 2004 - 08:21:52 EST

  • Next message: Michael Everson: "Re: Collation charts out of date"

    From: "Peter Kirk" <peterkirk@qaya.org>
    > On 30/01/2004 13:58, Philippe Verdy wrote:
    >
    > > ...
    > >
    > >I also agree that small capitals have a tertiary or quaternary
    differences,
    > >but it's not clear if they are a variant of lowercase when used as a font
    > >style for all letters, or of uppercase letters.
    >
    > Good question. I was assuming of uppercase, but maybe not.

    The most common use I have seen of small capitals is as a font style, where
    they were used to represent lowercase letters (the uppercase letters being
    presented with full-height style).

    If this is not a font style but encodes a linguistic letter difference,
    small capitals are also most often used as lowercase letters (and their
    uppercase mapping is the standard uppercase letter)...

    My bet is then that the default collation of small capital letters is as a
    variant of lowercase letters with which they should sort, just between
    uppercase and lowercase.

    > >So I bet they should have the same distinction as between lowercase and
    > >uppercase, so that case-insensitive collation (which ignores secondary
    > >differences) will work correctly even if tertiary and quaternary
    difference
    > >are kept to sort accents and other minor variants, probably by sorting
    small
    > >capitals between uppercase and lowercase letters at the same collation
    > >weight.
    >
    > Philippe, I would agree with you that this is a sensible thing to do.
    > But, as Ken has said, it does cause some difficulties. Presumably this
    > is because there is no defined tertiary weight for small caps in
    > http://www.unicode.org/reports/tr10/#Tertiary_Weight_Table, and in
    > software which implements this.

    I did not meant a tertiary level, but the secondary level used to
    differentiate uppercase and lowercase is just as good for that collation,
    with simply 3 (4?) weights assigned for the uppercase (and titlecase?),
    small capital, and lowercase "styles" at that level.



    This archive was generated by hypermail 2.1.5 : Sat Jan 31 2004 - 09:09:31 EST