Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Aug 13 2003 - 14:09:04 EDT

Next message: Philippe Verdy: "Re: Compatibility decompositions"

Previous message: Rick McGowan: "Last Call: UTS #18, Regular Expressions"
In reply to: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Next in thread: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: Jon Hanna: "RE: Questions on ZWNBS - for line initial holam plus alef"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Peter Kirk" <peter.r.kirk@ntlworld.com>

> There is some potential for real trouble here, if one process outputs
an
> NMTOKEN starting with a combining character preceded by a separating
> space, or something else which is changed into a space, and another
> process takes the new space plus combining character as a unit and so
> doesn't recognise the separation. Any hackers and virus programmers
> reading this will soon start flooding the Internet with tokens
beginning
> with combining characters in the hope of crashing implementations or
> finding back doors. Of course this wouldn't have been a problem if
> Unicode had never defined space plus combining character as legal and
> meaningful. But this is not my problem!

I do agree: a XML document could require the use at some place of a
given attribute or element. If this attribute name follows the element
name
after a line break, which gets changed into a space during parsing,
forcing
XML parsers to treat SPACE+combining as a unbreakable grapheme
cluster acting like a letter would have the effect of creating a new
element
name which may violate the lement name identity. Now suppose that the
attribute name contains a colon, you have created a custom namespace
name, under which you can add any element you like, even if this was
forbidden by the content-model of the reference schema.

So this would invalidate existing documents, or create holes allowing
insertion of arbitrary XML content, if the XML application is not
validating extremely strictly the element names (the pair namespace+
name) and exclude completely from processing any unrecognized
element (including all its content and attributes). This would be a
breach in the content model which may have been validated and tested
for security in another layer of the document encoding process (notably
when XML documents are created from templates, such as XSL
processors, or custom C source using simple template substitution).

So for me the sequence SPACE+combining should not be acceptable
as a valid grapheme cluster within element names or attribute names,
and thus would need to be excluded from NMTOKEN. The correct
way to do it is to consider it NOT A LETTER, but a symbol (Sk),
exactly like other spacing diacritics, which are already invalid in
NMTOKEN.

There still remains the unresolved question of grapheme clusters
that could span the starting "<" or ending ">" or "/>" of tags, or
the leading "&" of a entitity reference. For this reason, defective
combining sequences (combining characters without a leading base
character) should be forbidden (invalid for XML).

So there remains a unsolved conflict here: defective combining
sequences cause security or validity problems in XML documents,
and a non-defective SPACE+combining sequence cause also
security problems. There's no secure choice to represent
spacing diacritics which are not already encoded in a precomposed
form...

Next message: Philippe Verdy: "Re: Compatibility decompositions"
Previous message: Rick McGowan: "Last Call: UTS #18, Regular Expressions"
In reply to: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Next in thread: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: Jon Hanna: "RE: Questions on ZWNBS - for line initial holam plus alef"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 14:43:47 EDT