RE: Question on Canonical equivilance

From: Kenneth Whistler (
Date: Wed Nov 24 2004 - 13:51:13 CST

  • Next message: Jony Rosenne: "RE: No Invisible Character - NBSP at the start of a word"

    Tim Greenwood asked:

    > > All of the spacing combining marks (general category Mc) except
    > > musical symbols have a canonical combining class of 0. So, for example
    > >
    > > 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left
    > > of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on the right) is
    > > canonically distinct from 0B95 0BBE 0BC7 - even though I presume that
    > > they would generate an identical glyph. Why is this?

    Because <0BBE, 0BC6> is not canonically equivalent to U+0BCA, which
    is the preferred representation for this vowel, anyway.

    Making Indic spacing vowel matras have non-zero combining classes,
    and forcing them to start reordering under normalization would
    have introduced even greater complications. As it is, <0BBE, 0BC6>
    should simply be treated as a misspelling of Tamil.

    And Peter Constable continued:

    > The question that comes to my mind isn't why some Mc marks don't have
    > non-zero classes Right Attached class, but rather why any Mc marks *do*
    > have non-zero classes.
    > There are 352 marks with a canonical combining class > 0. Only 8 of
    > these, all musical symbols, are Mc.

    Because it seemed like a good idea at the time, because nobody
    objected, and because we are stuck with it now, inconsistent
    or not. [gc=Mc] ==> [ccc=0] is *not* an invariant we have ever
    tried to maintain in UnicodeData, by the way.

    Also, any musical scoring program that is actually making use
    of the various note flags and stems involved in this to construct
    rendered musical scores is going to be a *very* special case
    program, anyway. These particular 8 items can*NOT* be rendered
    correctly in context by out-of-the box generic text rendering


    This archive was generated by hypermail 2.1.5 : Wed Nov 24 2004 - 13:54:52 CST