Re: Questions on ZWNBS - for line initial holam plus alef

From: John Cowan (
Date: Wed Aug 13 2003 - 09:11:30 EDT

  • Next message: Mark Davis: "Re: Questions on ZWNBS - for line initial holam plus alef"

    Peter Kirk scripsit:

    > Sorry, I'm confused. Are you saying that the input processing will
    > translate line breaks into spaces within attribute values, unless
    > inserted as 
 ? Well, I suppose this is fair enough as it is up to
    > the user not to enter garbage.

    Yes, that is how attribute values work. The idea is that when you have
    a long string in an attribute value, you can introduce a line break for
    readability without its having any effect on processing, thus:

    <quotation reference="Nicholas, Nick, and John Cowan. What is Lojban? /
    la lojban. mo Fairfax, VA: Logical Language Group (2003)">

    The line break gets turned into a space before the application sees it.

    Additionally, if you have a long list of tokens in an attribute value,

    <person specialties="cook captain-bold maid-of-the-Nancy-brig bosun-tight
            midshipmite crew-of-the-captains-gig" />

    the application does not have to deal with either the line break or the
    tab character specially, but sees simply a list of tokens separated by
    a single space.

    > OK if this is clearly illegal, but this might restrict use of some
    > languages in NMTOKEN. Would NBSP + combining be allowed?

    No, it isn't. As I say, attribute values aren't meant to handle
    arbitrary natural-language human-readable text.

    > There is some potential for real trouble here, if one process outputs an
    > NMTOKEN starting with a combining character preceded by a separating
    > space, or something else which is changed into a space, and another
    > process takes the new space plus combining character as a unit and so
    > doesn't recognise the separation.

    If the second processor is XML-compliant, it will treat the space as a
    token separator, not as part of the token (as I say, spacing diacritics
    aren't allowed in tokens). If the XML document is printed or displayed
    in its raw form (that is, treating it as plain rather than structured
    text), you may see something a bit strange, but that will not affect
    the processing model.

    > reading this will soon start flooding the Internet with tokens beginning
    > with combining characters in the hope of crashing implementations or
    > finding back doors.

    Very, very unlikely.

    Winter:  MIT,                                   John Cowan
    Keio, INRIA,                          
    Issue lots of Drafts.                 
    So much more to understand!           
    Might simplicity return?                        (A "tanka", or extended haiku)

    This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 10:12:51 EDT