Re: compatibility characters (in XML context)

From: Philippe Verdy (
Date: Fri Nov 14 2003 - 06:28:56 EST

  • Next message: Kent Karlsson: "RE: compatibility characters (in XML context)"

    ----- Original Message -----
    From: "Alexandre Arcouteil" <>
    To: <>
    Sent: Friday, November 14, 2003 10:41 AM
    Subject: compatibility characters (in XML context)

    > This is a beginner question :
    > In the XML 1.1 Proposed Recommendation 05 November 2003
    > (, it is said that "Document authors are
    > encouraged to avoid "compatibility characters", as defined in section
    > 6.8 of [Unicode]" so relating to Unicode 2.0.
    > I don't see any online documentation about explicit definition of
    > "compatibility characters" according to 2.0.

    Compatibility characters can be defined as the characters whose canonical
    decomposition mapping is either::

        (1) a singleton (example the Angström symbol, canonically mapped to A
    with diaeresis, or the list of unified Han ideographs, only included for
    compatibility with legacy charsets or because of assignment errors in
    Unicode 1.0) and that are implicitly restricted from being recomposed in all
    NF* forms, or

        (2) two-code _canonical_ decomposition mapping, but are excluded from
    canonical composition (example the hebrew shin letter with shin dot).

    These characters will never be part of any string in a normalized form (NFC,

    > At least I'd like to know if characters like "é" "ç" or "œ" are
    > concerned.

    No.: "é" and "ç" have canonical decompositions, but are not excluded from
    And the "oe ligature" has only a compatiblity decomposition, and then is not
    a compatibility character.

    > Is somewhere a complete chart of "compatibility characters" ?

    Look at the Unicode data file which lists composition exclusions...

    This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 07:22:06 EST