From: Mark Davis (email@example.com)
Date: Fri Nov 14 2003 - 10:57:48 EST
Phillipe, instead of trying to sound authoritative by making up a whole-cloth
definition -- one that is completely and utterly wrong -- and thereby confuse
and mislead a beginner, you should either be silent or simply point the person
to the Unicode glossary:
► शिष्यादिच्छेत्पराजयम् ◄
----- Original Message -----
From: "Philippe Verdy" <firstname.lastname@example.org>
To: "Alexandre Arcouteil" <email@example.com>
Sent: Fri, 2003 Nov 14 03:28
Subject: Re: compatibility characters (in XML context)
> ----- Original Message -----
> From: "Alexandre Arcouteil" <firstname.lastname@example.org>
> To: <email@example.com>
> Sent: Friday, November 14, 2003 10:41 AM
> Subject: compatibility characters (in XML context)
> > This is a beginner question :
> > In the XML 1.1 Proposed Recommendation 05 November 2003
> > (http://www.w3.org/TR/xml11), it is said that "Document authors are
> > encouraged to avoid "compatibility characters", as defined in section
> > 6.8 of [Unicode]" so relating to Unicode 2.0.
> > I don't see any online documentation about explicit definition of
> > "compatibility characters" according to 2.0.
> Compatibility characters can be defined as the characters whose canonical
> decomposition mapping is either::
> (1) a singleton (example the Angström symbol, canonically mapped to A
> with diaeresis, or the list of unified Han ideographs, only included for
> compatibility with legacy charsets or because of assignment errors in
> Unicode 1.0) and that are implicitly restricted from being recomposed in all
> NF* forms, or
> (2) two-code _canonical_ decomposition mapping, but are excluded from
> canonical composition (example the hebrew shin letter with shin dot).
> These characters will never be part of any string in a normalized form (NFC,
> NFD, NFKC, NFKD).
> > At least I'd like to know if characters like "é" "ç" or "œ" are
> > concerned.
> No.: "é" and "ç" have canonical decompositions, but are not excluded from
> And the "oe ligature" has only a compatiblity decomposition, and then is not
> a compatibility character.
> > Is somewhere a complete chart of "compatibility characters" ?
> Look at the Unicode data file which lists composition exclusions...
This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 11:58:40 EST