Re: Questions on ZWNBS - for line initial holam plus alef

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Aug 06 2003 - 19:13:21 EDT

  • Next message: John Jenkins: "Re: Conflicting principles"

    Philippe Verdy said:

    > > The same thing can be said about any inserted invisible character,
    > > combining or not.
    > >
    > > How is: <a, ring above, null, dot below> supposed to be different from
    > > <a, dot below, null, ring above>
    > >
    > > How is: <a, ring above, LRM, dot below> supposed to be different from
    > > <a, dot below, LRM, ring above>
    > >
    > > In display, they might not be distinct, unless you were doing some
    > > kind of show-hidden display. Yet these sequences are not canonically
    > > equivalent, and the presence of an embedded control character or an
    > > embedded format control character would block canonical reordering.
    >
    >
    > I disagree with you, using a LRM mark in the middle of a combining
    > sequence is conforming to canonicalization rules but is clearly
    > ill-formed,

    It is not. TUS 4.0, p. 71:

    D17a Defective combining character sequence: A combining character
         sequence that does not start with a base character.
         
         * Defective combining character sequences occur when a sequence
           of combining characters appears at the start of a string or
           follows a control or format character. Such sequences are
           defective from the point of view of handling of combining
           marks, but are not ill-formed.
                  ^^^^^^^^^^^^^^^^^^^^^^

    > as well as using a NULL control in the middle, which
    > breaks the combining sequence.

    I'm not claiming it doesn't break the combining sequence. Of
    course it does. It creates a defective combining character
    sequence, and that poses a challenge for rendering, since it
    departs from the usual expectations for normal combining
    character sequences. The renderer has to split hairs between
    the fact that it is dealing with a defective combining
    character sequence and the fact that it is dealing with a
    default ignorable character which is supposed to be ignored
    for text processes it is not immediately applicable to.

    But I challenge you to find anything in the standard that
    *prohibits* such sequences from occurring.

    And *if* they occur, they are not canonically equivalent, which
    was the point I was making to Kent.

    > The proposal to use CGJ however is legal: it does not break the
    > combining sequences and grapheme clusters, and thus the whole
    > encoded sequence encoded with CGJ will be considered by
    > rendering engines, where CGJ is a no-op for rendering but not for
    > the canonical ordering ...

    Well, yes, which is why I have been advocating it as the
    solution to the Biblical Hebrew text representation problem.
    I agree with you about that. But it need not be characterized
    as "legal" in opposition to the other examples I cited above.
    All of these sequences are "legal" and allowed by the
    standard.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 20:21:25 EDT