Re: compatibility characters (in XML context)

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Nov 14 2003 - 10:57:48 EST

  • Next message: Doug Ewell: "Re: compatibility characters (in XML context)"

    Phillipe, instead of trying to sound authoritative by making up a whole-cloth
    definition -- one that is completely and utterly wrong -- and thereby confuse
    and mislead a beginner, you should either be silent or simply point the person
    to the Unicode glossary:

    http://www.unicode.org/glossary/#compatibility_character

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Philippe Verdy" <verdy_p@wanadoo.fr>
    To: "Alexandre Arcouteil" <lex@free.fr>
    Cc: <unicode@unicode.org>
    Sent: Fri, 2003 Nov 14 03:28
    Subject: Re: compatibility characters (in XML context)

    > ----- Original Message -----
    > From: "Alexandre Arcouteil" <lex@free.fr>
    > To: <unicode@unicode.org>
    > Sent: Friday, November 14, 2003 10:41 AM
    > Subject: compatibility characters (in XML context)
    >
    >
    > > This is a beginner question :
    > >
    > > In the XML 1.1 Proposed Recommendation 05 November 2003
    > > (http://www.w3.org/TR/xml11), it is said that "Document authors are
    > > encouraged to avoid "compatibility characters", as defined in section
    > > 6.8 of [Unicode]" so relating to Unicode 2.0.
    > >
    > > I don't see any online documentation about explicit definition of
    > > "compatibility characters" according to 2.0.
    >
    > Compatibility characters can be defined as the characters whose canonical
    > decomposition mapping is either::
    >
    > (1) a singleton (example the Angström symbol, canonically mapped to A
    > with diaeresis, or the list of unified Han ideographs, only included for
    > compatibility with legacy charsets or because of assignment errors in
    > Unicode 1.0) and that are implicitly restricted from being recomposed in all
    > NF* forms, or
    >
    > (2) two-code _canonical_ decomposition mapping, but are excluded from
    > canonical composition (example the hebrew shin letter with shin dot).
    >
    > These characters will never be part of any string in a normalized form (NFC,
    > NFD, NFKC, NFKD).
    >
    > > At least I'd like to know if characters like "é" "ç" or "œ" are
    > > concerned.
    >
    > No.: "é" and "ç" have canonical decompositions, but are not excluded from
    > recomposition.
    > And the "oe ligature" has only a compatiblity decomposition, and then is not
    > a compatibility character.
    >
    > > Is somewhere a complete chart of "compatibility characters" ?
    >
    >
    > Look at the Unicode data file which lists composition exclusions...
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 11:58:40 EST