Re: Questions on ZWNBS - for line initial holam plus alef

From: John Cowan (cowan@mercury.ccil.org)
Date: Wed Aug 13 2003 - 09:11:30 EDT

Next message: Mark Davis: "Re: Questions on ZWNBS - for line initial holam plus alef"

Previous message: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
In reply to: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Next in thread: Mark Davis: "Re: Questions on ZWNBS - for line initial holam plus alef"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk scripsit:

> Sorry, I'm confused. Are you saying that the input processing will
> translate line breaks into spaces within attribute values, unless
> inserted as 
 ? Well, I suppose this is fair enough as it is up to
> the user not to enter garbage.

Yes, that is how attribute values work. The idea is that when you have
a long string in an attribute value, you can introduce a line break for
readability without its having any effect on processing, thus:

The line break gets turned into a space before the application sees it.

Additionally, if you have a long list of tokens in an attribute value,
thus:

the application does not have to deal with either the line break or the
tab character specially, but sees simply a list of tokens separated by
a single space.

> OK if this is clearly illegal, but this might restrict use of some
> languages in NMTOKEN. Would NBSP + combining be allowed?

No, it isn't. As I say, attribute values aren't meant to handle
arbitrary natural-language human-readable text.

> There is some potential for real trouble here, if one process outputs an
> NMTOKEN starting with a combining character preceded by a separating
> space, or something else which is changed into a space, and another
> process takes the new space plus combining character as a unit and so
> doesn't recognise the separation.

If the second processor is XML-compliant, it will treat the space as a
token separator, not as part of the token (as I say, spacing diacritics
aren't allowed in tokens). If the XML document is printed or displayed
in its raw form (that is, treating it as plain rather than structured
text), you may see something a bit strange, but that will not affect
the processing model.

> reading this will soon start flooding the Internet with tokens beginning
> with combining characters in the hope of crashing implementations or
> finding back doors.

Very, very unlikely.

-- 
Winter:  MIT,                                   John Cowan
Keio, INRIA,                                    jcowan@reutershealth.com
Issue lots of Drafts.                           http://www.ccil.org/~cowan
So much more to understand!                     http://www.reutershealth.com
Might simplicity return?                        (A "tanka", or extended haiku)

Next message: Mark Davis: "Re: Questions on ZWNBS - for line initial holam plus alef"
Previous message: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
In reply to: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Next in thread: Mark Davis: "Re: Questions on ZWNBS - for line initial holam plus alef"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 10:12:51 EDT