Re: Normalisation in XML, HTML etc

From: John Cowan (cowan@mercury.ccil.org)
Date: Wed Oct 15 2003 - 05:34:20 CST


Peter Kirk scripsit:

> I have heard it mentioned in general terms that W3C has specified that
> text should be normalised according to NFC. What actually is the scope
> of this specification? Does it apply to all XML, HTML etc? Is it
> mandatory or just a recommendation?

It is not mandatory. It is a SHOULD, which is between MUST (mandatory)
and MAY (permissive); it means that "there may exist valid reasons
in particular circumstances to ignore a particular item, but the full
implications must be understood and carefully weighed before choosing
a different course."

XML 1.0 is silent on the subject. XML 1.1 (not yet finalized) says
that XML parsers SHOULD (in the sense above) verify that their input is
normalized, and explains exactly what "normalized" means in connection
with various XML constructs; for example, the character just after a
start-tag SHOULD not be a combining character.

> I would also like to know if this is actually applied or enforced by
> products such as OpenOffice and Microsoft Office 2003 which use XML as
> one of their native document formats. Will text saved in these formats
> be normalised to NFC? Should it be?

Output SHOULD be normalized; input SHOULD be verified as normalized,
but not forcibly normalized (doing so is a security hole). Whether
any particular product does this is up to the people who make the
product, and I have no information on either of those.

-- 
One art / There is                      John Cowan <jcowan@reutershealth.com>
No less / No more                       http://www.reutershealth.com
All things / To do                      http://www.ccil.org/~cowan
With sparks / Galore                     -- Douglas Hofstadter


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST