Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (
Date: Wed Aug 13 2003 - 10:50:26 EDT

  • Next message: Jon Hanna: "RE: Questions on ZWNBS - for line initial holam plus alef"

    ----- Original Message -----
    From: "Peter Kirk" <>
    To: "Jon Hanna" <>
    Cc: <>
    Sent: Wednesday, August 13, 2003 3:05 PM
    Subject: Re: Questions on ZWNBS - for line initial holam plus alef

    > On 13/08/2003 04:44, Jon Hanna wrote:
    > >No, the safe thing to do (and the thing that is done) is to treat the
    > >as a space ignoring the fact that the NMTOKEN contains a combining
    > >character, this is even safer than your suggestion since it can't
    > >mis-identify the combining properties of a character.
    > >
    > >
    > OK, it's safe, but it is a misuse of Unicode. As space plus combining
    > character is a unit in Unicode, it should be treated as a unit by
    > level protocols. If higher level protocols are allowed to do arbitrary
    > things within Unicode units, there is no end to the possible
    > See for example, from Unicode 4.0 chapter 3:
    > C7 A process shall interpret a coded character representation
    > to the character
    > semantics established by this standard, if that process does interpret
    > that coded character
    > representation.

    OK, but XML inherits its behavior from SGML and you won't change it.
    The only way to bypass this would be to use entitiy references to encode
    the base space needed by the Unicode convention, so this is related to
    what Unicode defines as a higher level protocol, needed here to bypass
    the limitations of basic text. However it still creates a problem within
    CDATA sections, which are not supposed to contain entity references.
    One needs then to use the XML CDATA escaping mechanism with
    another escaping system specific to CDATA sections (which are
    formally anonymous text elements and equivalent to them).

    This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 11:43:09 EDT