From: Philippe Verdy (firstname.lastname@example.org)
Date: Fri Nov 14 2003 - 15:32:22 EST
From: "Alexandre Arcouteil" <email@example.com>
> Philippe Verdy wrote:
> > From: "Kent Karlsson" <firstname.lastname@example.org>
> >>Philippe Verdy wrote:
> >>> (1) a singleton (example the AngstrÃ¶m symbol, canonically
> >>>mapped to A with diaeresis,
> >>The Ã…ngstrÃ¶m (note spelling) sign is canonically mapped to
> >>capital a with ring.
> Thanks for all explanations,
> Keeping the A with ring exemple, does it means that compatibility
> characters can be identified according to Unicode charts ?
> By exemple, in the case of \u212B ANGSTROM SIGN, it is documented :
> "preferred representation is 00C5 Å latin capital letter a with ring".
> Is that a clear indication that \u212B is actually a compatibility
> character and then should be, according to XML 1.1 recommandation,
> replaced by the \u00C5 character ?
You must not replace any character directly within a intermediate XML
processing engine, unless this is clearly documented in its interface.
Generally, XML-based interfaces will perform normalization (NFC or NFD) of
their input string, but they are not required to do it. However it allows
the engine to guarantee that its outputs from canonically equivalent strings
will also be canonically equivalent (because normalizing on input guarantees
Unicode conformance for an algorithm P that process a string and return a
string just means that,
for every two inputs A and B:
A confirming algorithm does not require that its output be normalized.
These constraints do not apply for XML conformance (normalization to NFC is
recommanded, but not needed).
So for XML, if you choose to apply or require NFC or NFD normalization, the
only compatibility characters will be those Unicode characters that are
mapped canonically to a singleton, and those canonicallly mapped to a pair
but are excluded from recomposition.
This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 16:11:48 EST