Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Aug 08 2003 - 11:54:36 EDT

  • Next message: Raymond Mercier: "Re: Which ancestral links"

    On Tuesday, August 05, 2003 1:52 AM, Kenneth Whistler <kenw@sybase.com> wrote:

    > Peter,
    >
    > > > The carrier for a combining mark that is to display in isolation
    > > > without a base character is U+0020 SPACE. If you want to also
    > > > indicate the absence of a line break opportunity, then the
    > > > carrier is U+00A0 NO-BREAK SPACE (NBSP).
    > > >
    > > Neither of these is appropriate to the case I have in mind
    > > (described in greater detail below) as they are not zero width and
    > > therefore give an unwanted indent at the start of a line.
    >
    > Of course, because the whole point of this convention is to display
    > a non-spacing mark in isolation, not applied to a base character.
    >
    > > U+200B ZERO WIDTH SPACE might be
    > > appropriate, but this has the problem that it is a break
    > > opportunity, which is not always appropriate.
    >
    > U+200B ZERO WIDTH SPACE is not appropriate, for the same reason
    > the U+FEFF (or U+2060) is not appropriate: The Standard does
    > not specify the display of non-spacing marks on it as a means
    > of showing the marks without base characters. And, as you indicate,
    > U+200B (but also U+FEFF and U+2060) are implicated in the control
    > of line break opportunities. They are certainly not defined
    > as glyph display anchors or some such.

    Here I disagree: ZWS is a white-space, not a format control, and thus it
    has a glyphic and semantic identity by itself (unlike ZWNBSP or WJ).
    So ZWS clearly qualifies as a base character, and is certainly better
    (conceptually and per its breaking properties) than the standard ASCII
    space which has an implied minimum width (which may be too large
    to be used as a holder for a tiny diacritic like a dot above, or even an
    acute accent.

    200B;ZERO WIDTH SPACE;Zs;0;BN;;;;;N;;;;;

    When we speak about combining sequences, they are already
    supposed to expand the width or height of a base character to
    which it applies, so ZWS despite being zero-width itself, does
    not make this property inherited to the combining sequence which
    includes it.

    For me, the best two candidates for holders of isolated diacritics
    are ZWS (if breakable before and after the combining sequence),
    or WJ (if not breakable when the isolated diacritic must be used
    within the same word without internal break opportunity).
    However WJ is a control and does not fit well for the second
    usage. Could there be another codepoint assigned that has
    these properties:

    20CF;ZERO WIDTH SYMBOL;Sk;0;ON;<compat> 0020;;;;N;;;;;

    i.e. being considered symbolic, not a whitespace, with
    combining class 0 (not combining), and used as an
    explicit base for a isolated spacing diacritic to never show
    with a dotted circle? (note U+20CF is just a suggestion, as
    it fits at end of the symbolic block used for currency symbols,
    just before the "extended" combining characters block, and
    because the U+02XX block where other "Sk" spacing
    diacritics are defined is full).

    The compatibility decomposition to a space is to make it
    in sync with other compatibly decomposable spacing
    diacritics.

    The new character would allow to represent diacritics that currently
    don't have a spacing counterpart, and use them as if they were letter
    like. Let's look at a similar diacritic which currently has an existing
    "precombined" spacing version:

    00B4;ACUTE ACCENT;Sk;0;ON;<compat> 0020 0301;;;;N;SPACING ACUTE;;;;



    This archive was generated by hypermail 2.1.5 : Fri Aug 08 2003 - 12:31:14 EDT