From: Jon Hanna (jon@spin.ie)
Date: Wed Aug 13 2003 - 07:44:55 EDT
> >3) In attribute values that have a declared type other than
> CDATA, multiple
> > spaces are compressed to a single space, and leading and
> trailing spaces
> > are removed. After this is done, there can be no spaces in attributes
> > of type ID, IDREF, ENTITY, NMTOKEN, NOTATION, or enumerated types.
> > In the types IDREFS and ENTITIES, spaces are used to separate
> > individual tokens, none of which may begin with a combining character.
> > In the remaining type, NMTOKENS, individual characters may begin
> > with a combining character, so it is possible that such a token, if
> > not the first in the attribute, will be rendered in a peculiar way,
> > with the combining character placed over the separating space.
> > But that is a mere rendering glitch and in no way affects anything.
> >
> >
> Not just a rendering glitch, I suspect. If the combining character is
> combined with the separating space, the space loses many of its
> separating functions, and perhaps keeps a confusing subset of them with
> all sorts of possibilities of error. At best tokens beginning with
> combining characters will be unusable. At worst they will crash the
> implementation (and count on someone trying deliberately to do that!).
> The only safe thing to do is to specify that space followed by a
> combining mark is NEVER considered to be a space and this combination is
> NEVER generated.
No, the safe thing to do (and the thing that is done) is to treat the space
as a space ignoring the fact that the NMTOKEN contains a combining
character, this is even safer than your suggestion since it can't
mis-identify the combining properties of a character.
This effectively bans space+combining (and for that matter NBSP+combining
since NBSP isn't allowed in NMTOKENs) within an NMTOKEN and means that if
you attempt to begin an NMTOKEN with space+combining it will be treated as
beginning with the combining character.
The resulting lost of expressive power in having this banned is negligible,
it means that you can't use what is quite a linguistic oddity
(space+combining is mainly used in meta-discussion of combining marks as was
mentioned earlier) in a context where it is human-readable (hopefully) but
not fully general text. NMTOKENs should only be given "raw" to a user by
relatively low-level tools (i.e. general purpose XML tools for developers),
in other contexts they should be represented by a more user-friendly and
application-appropriate indicator (perhaps text, perhaps not) so the
inability to use space+combining won't apply at that level.
This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 09:03:22 EDT