Re: Questions on ZWNBS - for line initial holam plus alef

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 11 2003 - 15:13:59 EDT

  • Next message: Kenneth Whistler: "Re: Questions on ZWNBS - for line initial holam plus alef"

    Peter Kirk asked:

    > Thanks for the clarification. I probably misunderstood Jon's intention.
    > But is there a problem if, for example, an application sees the string
    > <space, space, combining mark> and regularises it (wrongly!) to <space,
    > combining mark>?

    Then you have a problem, of course.

    What the Unicode Standard says about application of nonspacing
    combining marks to SPACE seem clear to me.

    What other standards say about space folding is clear in their
    own contexts.

    If someone is implementing both such standards together, then
    one has to be careful how the requirements articulate.

    In Unicode terms, a space folding is an example of a "knowing
    modification" of the content of the text. It is perfectly o.k.
    to modify Unicode text, of course, *as long as you know what
    you are doing* -- i.e., you aren't converting valid text to
    bit hash because you aren't conforming to the meaning of
    the characters or to their encoding forms.

    Now if a process is doing a space folding, but is applying
    it to Unicode text as a "semi-ignorant modification", i.e.,
    without being aware of the fact that nonspacing combining
    marks can apply to SPACE characters (and that such sequences
    are valid combining character sequences and should be treated
    analogously with other grapheme clusters, viz UAX #29), then
    it is modifying the text away from its intended content without
    *knowing* what it is actually doing. Such mistakes are
    programming errors in application of the relevant standards.

    Of course a standard which mandates space folding is also
    within its rights to mandate, for example, the non-use of
    nonspacing marks applied to SPACE characters. It can simply
    rule out such sequences as valid for its context, in which
    case the problem goes away.

    The important thing here is to know what you are doing when
    you modify text, and, as far as possible, to accomplish
    such modifications in ways that are the same as other
    processes which also know what they are doing. That is the
    basis for interoperability of textual data.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Aug 11 2003 - 15:50:59 EDT