Re: Folding algorithm and canonical equivalence

From: Asmus Freytag (
Date: Sun Jul 18 2004 - 02:52:37 CDT

  • Next message: Asmus Freytag: "Re: Folding algorithm and canonical equivalence"

    At 11:15 PM 7/17/2004, John Cowan wrote:
    >I agree that in the TR#30 context, the Right Thing is to remove the
    >character pair mappings altogether, and all of the single-character
    >mappings that have canonical decompositions

    In other words, in your opinion, the reasonable thing to do would be for
    someone to do the AccentFolding as defined in the TR, and then do a
    DiacriticFolding, to fold the cases where even in NFD accents don't exist
    as as separate characters.

    That's certainly reasonable and not the only case where it's interesting to
    have chained foldings.

    Jony is arguing to extend AccentFolding to Hebrew (fold to unpointed). His
    suggestion is to fold *all* combining marks used with Hebrew in that case.
    I want to double check that he really means all combining marks in the
    Hebrew block, or just some of them.

    AccentFolding can't just fold all gc=Mn, since that would include quite a
    few that are script specific as well as the marks for Symbols, for which
    different folding rules might need to apply in some context. So I think
    I'll use as the set of accents to remove all the ones that show up as part
    of decompositions, plus as many Hebrew accents that Jony can confirm.

    (another alternative would be to make the Hebrew folding a separate
    definition, to allow people to apply one, but not the other.)

    I'll make another Draft of DiacriticFolding.txt with the canonical decomp
    derivables removed.

    This archive was generated by hypermail 2.1.5 : Sun Jul 18 2004 - 02:55:01 CDT