Re: Question on Canonical equivilance

From: Antoine Leca (
Date: Wed Nov 24 2004 - 13:19:03 CST

  • Next message: Antoine Leca: "Misuse of 8th bit [Was: My Querry]"

    On Wednesday, November 24th, 2004 16:26Z Tim Greenwood va escriure:

    > All of the spacing combining marks (general category Mc) except
    > musical symbols have a canonical combining class of 0.
    > Why is this?

    About the Indic vowel signs, I assume it is this way to avoid them being
    reordered (in weird ways), particularly when there are multi piece vowels

    > The Canonical
    > Combining Class Values in UCD.html has entries and values for left
    > attached and right attached - but no characters have these values.

    They (the Indic vs) happen to have >0 class before v.2.1.8 (1998).
    I believe UCD.html still reflects this past state.

    For example, the accompagning README tells us:

      Note that as of the 2.1.8 update of the Unicode Character Database,
      the decompositions in the UnicodeData.txt file can be used to recursively
      derive the full decomposition in canonical order, without the need
      to separately apply canonical reordering. However, canonical reordering
      of combining character sequences must still be applied in decomposition
      when normalizing source text which contains any combining marks.

    I assume it has to do with the work of TR15 that you might consult
    ( for enlightment.


    This archive was generated by hypermail 2.1.5 : Thu Nov 25 2004 - 01:49:10 CST