Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Aug 06 2003 - 18:22:23 EDT

  • Next message: Philippe Verdy: "Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)"

    On Wednesday, August 06, 2003 10:19 PM, Kenneth Whistler <kenw@sybase.com> wrote:

    > Kent Karlsson responded:
    >
    > > > > I see no particular *technical* problem with using WJ, though.
    > > > > In contrast
    > > > > to the suggestion of using CGJ (re. another problem)
    > > > anywhere else but
    > > > > at the end of a combining sequence. CGJ has combining class
    > > > 0, despite
    > > > > being invisible and not ("visually") interfering with any other
    > > > > combining
    > > > > mark. Using CGJ at a non-final position in a combining sequence
    > > > > puts in doubt the entire idea with combining classes and normal
    > > > > forms.
    > > >
    > > > Why?
    > >
    > > See above (I DID write the motivation!).
    >
    > I guess that I did not (and still do not) see the motivation for
    > your final statement.
    >
    > > Combining classes are generally
    > > assigned according to "typographic placement". Combining characters
    > > (except those that are really letters) that have the "same"
    > > placement, and "interfere typographically" are assigned the same
    > > combining class, while those that don't get different classes, and
    > > the relative order is then considered unimportant (canonically
    > > equivalent). How is then,
    > > e.g. <a, ring above, cgj, dot below> supposed to be different from
    > > <a, dot below, cgj, ring above> (supposing all involved characters
    > > are fully supported), when <a, ring above, dot below> is NOT
    > > supposed to be much different from <a, dot below, ring above>
    > > (them being canonically equivalent)? An invisible combining
    > > character does not interfere typographically with anything, it
    > > being invisible!
    >
    > The same thing can be said about any inserted invisible character,
    > combining or not.
    >
    > How is: <a, ring above, null, dot below> supposed to be different from
    > <a, dot below, null, ring above>
    >
    > How is: <a, ring above, LRM, dot below> supposed to be different from
    > <a, dot below, LRM, ring above>
    >
    > In display, they might not be distinct, unless you were doing some
    > kind of show-hidden display. Yet these sequences are not canonically
    > equivalent, and the presence of an embedded control character or an
    > embedded format control character would block canonical reordering.

    I disagree with you, using a LRM mark in the middle of a combining
    sequence is conforming to canonicalization rules but is clearly
    ill-formed, as well as using a NULL control in the middle, which
    breaks the combining sequence.

    So in your two examples above, inserting the LRM or NULL splits
    a combining sequence and creates 3 ones, each with their own
    properties, and the last one is ill-formed as it contains a combining
    character after a control and not a base or combining character.

    The proposal to use CGJ however is legal: it does not break the
    combining sequences and grapheme clusters, and thus the whole
    encoded sequence encoded with CGJ will be considered by
    rendering engines, where CGJ is a no-op for rendering but not for
    the canonical ordering where I see its only well-formed use as a
    canonical ordering fix for NF* normalized forms, or before a
    base character to extend the combining sequences used by
    renderers or character parsers and breakers.

    So your example with:
       <a, dot below, LRM, ring above>
    would in fact be rendered and parsed as three combining sequences:
       <a, dot below>, <LRM>, <ring above>
    i.e. a wellformed <a with dot below>, a control (normally invisible,
    but may be edited with a visible glyph with a dotted square like in
    the Unicode charts), and a ill-formed isolated <ring above> (most
    probably rendered with a dotted circle).

    So it cannot be thought as equivalent and not even rendered
    equivalently as:
       <a, dot below, ring above>
    or its canonical equivalents (not in normalized order but still
    conforming and well-formed, and handled equivalently):
       <a, ring above, dot below>
       <a with ring above, dot below>

    -- 
    Philippe.
    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.
    


    This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 19:03:36 EDT