From: John Cowan (cowan@mercury.ccil.org)
Date: Wed Aug 13 2003 - 09:11:30 EDT
Peter Kirk scripsit:
> Sorry, I'm confused. Are you saying that the input processing will
> translate line breaks into spaces within attribute values, unless
> inserted as 
 ? Well, I suppose this is fair enough as it is up to
> the user not to enter garbage.
Yes, that is how attribute values work. The idea is that when you have
a long string in an attribute value, you can introduce a line break for
readability without its having any effect on processing, thus:
<quotation reference="Nicholas, Nick, and John Cowan. What is Lojban? /
la lojban. mo Fairfax, VA: Logical Language Group (2003)">
The line break gets turned into a space before the application sees it.
Additionally, if you have a long list of tokens in an attribute value,
thus:
<person specialties="cook captain-bold maid-of-the-Nancy-brig bosun-tight
midshipmite crew-of-the-captains-gig" />
the application does not have to deal with either the line break or the
tab character specially, but sees simply a list of tokens separated by
a single space.
> OK if this is clearly illegal, but this might restrict use of some
> languages in NMTOKEN. Would NBSP + combining be allowed?
No, it isn't. As I say, attribute values aren't meant to handle
arbitrary natural-language human-readable text.
> There is some potential for real trouble here, if one process outputs an
> NMTOKEN starting with a combining character preceded by a separating
> space, or something else which is changed into a space, and another
> process takes the new space plus combining character as a unit and so
> doesn't recognise the separation.
If the second processor is XML-compliant, it will treat the space as a
token separator, not as part of the token (as I say, spacing diacritics
aren't allowed in tokens). If the XML document is printed or displayed
in its raw form (that is, treating it as plain rather than structured
text), you may see something a bit strange, but that will not affect
the processing model.
> reading this will soon start flooding the Internet with tokens beginning
> with combining characters in the hope of crashing implementations or
> finding back doors.
Very, very unlikely.
-- Winter: MIT, John Cowan Keio, INRIA, jcowan@reutershealth.com Issue lots of Drafts. http://www.ccil.org/~cowan So much more to understand! http://www.reutershealth.com Might simplicity return? (A "tanka", or extended haiku)
This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 10:12:51 EDT