RE: Questions on ZWNBS - for line initial holam plus alef

From: Jon Hanna
Date: Mon Aug 11 2003 - 09:59:36 EDT

    > solution with
    > SPACE is really tricky due to the special treatment of SPACE notably
    > in HTML, SGML, XML

    I disagree. There are a few different things that happen with whitespace in
    such technologies. Some of these only apply to elements that do not allow
    any character data apart from whitespace to appear directly within them, and
    hence are not an issue here. Some happen at relatively high level of
    processing, e.g. rendering (not parsing) of HTML, and as such should
    correctly process spaces combined with combining characters.

    There are only two theoretical problems that I can see here, the first is
    that a whitespace character other than space gets converted to space by
    attribute value normalisation, and that this changes the meaning of the text
    in some way. This could only occur if the combining character were the first
    character in a line of text, which is quite a nonsensical construct to begin

    The other would be with names, qnames, nmtokens and such. These are not
    normal textual content; they are human-readable constructs that are based on
    normal text because that makes it easier for some developers to work at a
    plain-text level (if they speak the natural language that the human-readable
    constructs were based on). Support for the linguistic oddity of a dialectic
    divorced from the context in which it would normally exist would have little
    justification in this place except for fulfilling the general goal of
    "completeness". Completeness is a laudable aim of course, but extreme
    edge-cases need only be brought in if they are both safe and cheap. Anyone
    designing an XML application who frequently considers isolated diacritics as
    the most natural choice in part of such tokens probably needs to take a
    couple of weeks holidays before continuing the design. Of course some of the
    characters that could be considered to be precomposed isolated diacritics
    are banned from use in nmtokens anyway.

