Re: Collation charts out of date

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Jan 30 2004 - 18:22:32 EST

Next message: Michael Everson: "New Contribution N2698"

Previous message: Michael Everson: "Re: Collation charts out of date"
In reply to: Philippe Verdy: "Re: Collation charts out of date"
Next in thread: Philippe Verdy: "Re: collation of small capitals (was: Collation charts out of date)"
Reply: Philippe Verdy: "Re: collation of small capitals (was: Collation charts out of date)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 30/01/2004 13:58, Philippe Verdy wrote:

> ...
>
>I also agree that small capitals have a tertiary or quaternary differences,
>but it's not clear if they are a variant of lowercase when used as a font
>style for all letters, or of uppercase letters.
>
>
>
Good question. I was assuming of uppercase, but maybe not.

>So I bet they should have the same distinction as between lowercase and
>uppercase, so that case-insensitive collation (which ignores secondary
>differences) will work correctly even if tertiary and quaternary difference
>are kept to sort accents and other minor variants, probably by sorting small
>capitals between uppercase and lowercase letters at the same collation
>weight.
>
>
>
Philippe, I would agree with you that this is a sensible thing to do.
But, as Ken has said, it does cause some difficulties. Presumably this
is because there is no defined tertiary weight for small caps in
http://www.unicode.org/reports/tr10/#Tertiary_Weight_Table, and in
software which implements this. I don't see why the currently unused and
suitably positioned tertiary weight 0x0007 cannot simply be allocated to
small caps. But this would mean changing a Unicode Technical Standard,
and software which implements it. I would tend to agree with Ken that it
is not worth doing this for just a few very rarely used characters.
There are defined weights for <small>, but these seem to be for hiragana
and katakana only.

On the other hand, it would be possible to assign to these characters
individually the tertiary weight 0x0007, or whatever other value might
be suitable. The work needed to do this once for all for about 33
characters (in the IPA Extensions and Phonetic Extensions blocks) is
hardly burdensome.

>The simplest way to allow tailoring is to map small capitals as secondary
>differences between uppercase and lowercase letters, and let a tailoring
>remap them in one simple operation to quaternary level, where a 1-weight gap
>has been left in quaternary collation weights to allow such remapping
>without needing shifts of weights.
>
>
>
>
Surely quaternary weights are by default code point values, and so
automatically have the required gaps. This tailoring can of course be
done, but I am not sure why it might be necessary.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Michael Everson: "New Contribution N2698"
Previous message: Michael Everson: "Re: Collation charts out of date"
In reply to: Philippe Verdy: "Re: Collation charts out of date"
Next in thread: Philippe Verdy: "Re: collation of small capitals (was: Collation charts out of date)"
Reply: Philippe Verdy: "Re: collation of small capitals (was: Collation charts out of date)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 30 2004 - 19:30:25 EST