Re: Questions on ZWNBS - for line initial holam plus alef

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Tue Aug 12 2003 - 12:27:57 EDT

  • Next message: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"

    On 12/08/2003 07:05, John Cowan wrote:

    >Very true. But what is this whitespace normalization?
    >
    >1) Throughout the document, line-end characters and sequences are normalized
    > to LF. Not relevant here.
    >
    >2) In attribute values, LF, CR, and TAB characters are normalized to spaces.
    > Not relevant here.
    >
    >
    This would be relevant if it is legal for the character after LF, CR,
    and TAB to be a combining mark. Is this legal? In this case what was
    previously a defective (but legal) combining sequence would turn into a
    non-defective one, but the intended whitespace would be lost.

    >3) In attribute values that have a declared type other than CDATA, multiple
    > spaces are compressed to a single space, and leading and trailing spaces
    > are removed. After this is done, there can be no spaces in attributes
    > of type ID, IDREF, ENTITY, NMTOKEN, NOTATION, or enumerated types.
    > In the types IDREFS and ENTITIES, spaces are used to separate
    > individual tokens, none of which may begin with a combining character.
    > In the remaining type, NMTOKENS, individual characters may begin
    > with a combining character, so it is possible that such a token, if
    > not the first in the attribute, will be rendered in a peculiar way,
    > with the combining character placed over the separating space.
    > But that is a mere rendering glitch and in no way affects anything.
    >
    >
    Not just a rendering glitch, I suspect. If the combining character is
    combined with the separating space, the space loses many of its
    separating functions, and perhaps keeps a confusing subset of them with
    all sorts of possibilities of error. At best tokens beginning with
    combining characters will be unusable. At worst they will crash the
    implementation (and count on someone trying deliberately to do that!).
    The only safe thing to do is to specify that space followed by a
    combining mark is NEVER considered to be a space and this combination is
    NEVER generated.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Tue Aug 12 2003 - 23:24:37 EDT