Re: Still can't work out whats a "canonical decomp" vs a "compatibility decomp"

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 07 2003 - 14:00:54 EDT

  • Next message: Michael Everson: "Re: Discussing fonts (was: Quest text font now available)"

    Theodore Smith asked:

    > I am having trouble understanding Unicode's documentation. I tried
    > looking through the glossary for an explanation of a term I see about a
    > lot "Canonical equivalence", this lead me back to the original document
    > that had been using it a lot, but which I still hadn't lead me to find
    > out what it meant.

    The place to look, now that the Conformance chapter for Unicode
    4.0 has been posted, is:

    http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf

    See Section 3.7 Decomposition, which defines 'decomposition',
    'compatibility decomposition', 'canonical decomposition',
    'compatibility equivalent', 'canonical equivalent', and so on.

    That is as close to the horse's mouth as you are going to get.

    >
    > On spending more effort trying to understand the document's terse
    > format, I found out that it is telling me I have to read some kind of
    > table listing.

    The decomposition mappings used in the definitions are in
    Section 16.1 Character Names List (not online yet) -- but that
    is simply the code charts and character names. You can find
    comparable listings just by opening up Unicode 3.0 and looking
    at the comparable Section 14.1 Character Names List. The
    conventions for the mappings are explained in great detail
    on pp. 333-334 there.

    >
    > So to work out what this word canonical means, I have to remember one
    > table, and to remember the word compatibility I have to remember
    > another table.

    No, you have to understand the distinction between two different
    types of decomposition mappings in *one* table.

    >
    > I haven't found those tables yet,

    The printed form is in the code charts and character names list
    in the standard. (I presume you *have* found those. ;-) )

    The ultimate, definitive, and normative source of the decomposition
    mappings is the data file, UnicodeData.txt, which is also online.

    http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

    The decomposition mapping field in that data file is used,
    programatically, to generate the decomposition mappings which
    are printed in the code charts.

    Read:

    http://www.unicode.org/Public/UNIDATA/UCD.html

    to get information about UnicodeData.txt and any other of the
    data files in the Unicode Character Database.

    > and I'm still wondering why I'm being
    > expected to take so many steps just to understand a term that is used
    > all over the place.
    >
    > I'm sure there is a much simpler explanation of canonical and
    > compatability?

    Nope. A simpler one would likely not be a correct one. See
    Section 3.7 of Chapter 3 of Unicode 4.0 (cited above) to get it
    correct.

    Although perhaps John Cowan might be persuaded to come up with
    the pocket edition explanation, comparable to his famous
    list of Unicode conformance requirements:
    http://www.unicode.org/faq/basic_q.html#15

    :-)

    --Ken

    > Or else why didn't they just use the terms "table1
    > decomposition" and "table2 decomposition"?
    >
    > I'm guessing those words have a meaning, somehow, but I can't say what
    > interpretation of those worsd have been used. Anyone can explain for
    > me, perhaps?



    This archive was generated by hypermail 2.1.5 : Wed May 07 2003 - 14:59:51 EDT