Re: compatibility characters (in XML context)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 14 2003 - 06:28:56 EST

Next message: Kent Karlsson: "RE: compatibility characters (in XML context)"

Previous message: jon@hackcraft.net: "Re: Definitions"
In reply to: Alexandre Arcouteil: "compatibility characters (in XML context)"
Next in thread: Kent Karlsson: "RE: compatibility characters (in XML context)"
Reply: Kent Karlsson: "RE: compatibility characters (in XML context)"
Reply: Mark Davis: "Re: compatibility characters (in XML context)"
Reply: Doug Ewell: "Re: compatibility characters (in XML context)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

----- Original Message -----
From: "Alexandre Arcouteil" <lex@free.fr>
To: <unicode@unicode.org>
Sent: Friday, November 14, 2003 10:41 AM
Subject: compatibility characters (in XML context)

> This is a beginner question :
>
> In the XML 1.1 Proposed Recommendation 05 November 2003
> (http://www.w3.org/TR/xml11), it is said that "Document authors are
> encouraged to avoid "compatibility characters", as defined in section
> 6.8 of [Unicode]" so relating to Unicode 2.0.
>
> I don't see any online documentation about explicit definition of
> "compatibility characters" according to 2.0.

Compatibility characters can be defined as the characters whose canonical
decomposition mapping is either::

(1) a singleton (example the Angström symbol, canonically mapped to A
with diaeresis, or the list of unified Han ideographs, only included for
compatibility with legacy charsets or because of assignment errors in
Unicode 1.0) and that are implicitly restricted from being recomposed in all
NF* forms, or

(2) two-code _canonical_ decomposition mapping, but are excluded from
canonical composition (example the hebrew shin letter with shin dot).

These characters will never be part of any string in a normalized form (NFC,
NFD, NFKC, NFKD).

> At least I'd like to know if characters like "é" "ç" or "œ" are
> concerned.

No.: "é" and "ç" have canonical decompositions, but are not excluded from
recomposition.
And the "oe ligature" has only a compatiblity decomposition, and then is not
a compatibility character.

> Is somewhere a complete chart of "compatibility characters" ?

Look at the Unicode data file which lists composition exclusions...

Next message: Kent Karlsson: "RE: compatibility characters (in XML context)"
Previous message: jon@hackcraft.net: "Re: Definitions"
In reply to: Alexandre Arcouteil: "compatibility characters (in XML context)"
Next in thread: Kent Karlsson: "RE: compatibility characters (in XML context)"
Reply: Kent Karlsson: "RE: compatibility characters (in XML context)"
Reply: Mark Davis: "Re: compatibility characters (in XML context)"
Reply: Doug Ewell: "Re: compatibility characters (in XML context)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 07:22:06 EST