Re: PRI#203: UTS#10 (UCA) update : characters needed to avoid contractions or expansions

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 31 Aug 2011 19:03:18 +0200

2011/8/31 Mark Davis ☕ <mark_at_macchiato.com>:
>> Another interesting question is: how can we encode in texts the fact
>> that a character usually considered as a ligature in a language (that
>> collates it as separate letters, even if the ligature is orthographic
>> and not just typographic), should still be collated as only one letter
>> ? In other words, are there some controls (or variant selection, or
>> other means) which would have the effect of disabling the default
>> expansions performed in a correctly tailored collation (for example,
>> in a French collator is there a way to disable the expansion of
>> occurences of "æ" into "ae" ?
>
> The CLDR tailoring syntax allows DUCET expansions to be suppressed or
> changed for a particular locale.
> There is no mechanism in UCA to change expansions on a code-point basis. Eg
> in the same string "Cæsium Kværner" to have the first æ expand to 'ae' but
> the second sort after 'Z', as in Norwegian.

You just provided the perfect example: we still have no way to specify
that one of the 'æ' occurence should not be expanded, and the other
one should be.

I would expect an orthographic convention, such as adding an invisible
control after one of the occurences to change the default behavior
*locally*, so that it could be detected by an UCA tailoring (using the
rule of longer match). But which kind of invisible control? May be the
occurence that should expand could be encoded as (a,ZWJ,e), and in
that case there is no more expansion for this substring, but just an
ignorable character in the middle...

-- Philippe.
Received on Wed Aug 31 2011 - 12:05:00 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 31 2011 - 12:05:00 CDT