Weighting ae and oe (was the unbelievably wandering Re: Decimal separator...)

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 19 2003 - 21:01:00 EDT

  • Next message: John Cowan: "Re: Decimal separator with more than one character?"

    Jim Allan summed up:

    > Accordingly, it would be reasonable that _ae_ and _oe_
    [I've substituted schematic names for the UTF-8, to avoid
     possible Latin-1/UTF-8 mailer trashing. --Ken]
    > be classed as the
    > same kind of thing in Unicode, whatever that thing might be. It would be
    > reasonable that the same collating rules be applied to both as to
    > primary or secondary differences from _a_ and _o_ respectively.

    I agree, but others have insisted that _oe_ default to
    its ligature treatment in the table, i.e., weighting as
    an <o,e> sequence.

    The difficulty for _ae_, which many people who opine about this
    issue tend to overlook, is that the Unicode Standard also
    includes, from Nordic standards, a number of accented _ae_
    characters as precomposed characters. These make the table
    considerably more complicated if the default treatment for
    _ae_ is to weight it as an <a,e> sequence, since you then
    have to figure out what to do with the accented forms, for which
    you have just drained the base character weighting.

    In any case, inconsistent as it is for these two characters,
    the allkeys.txt table was constructed as it is for a reason,
    (or several reasons, actually),
    and I'm disinclined to suggest that its handling of _ae_
    and _oe_ should be restructured, since that ripples out to
    cause further destabilization of tailorings based on the
    current values in the table.


    This archive was generated by hypermail 2.1.5 : Mon May 19 2003 - 22:01:31 EDT