Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

From: Philippe Verdy (
Date: Wed Aug 06 2003 - 18:47:57 EDT

  • Next message: Kenneth Whistler: "Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)"

    On Wednesday, August 06, 2003 11:48 PM, Peter Kirk <> wrote:

    > OK, what kind of markup should I use, in any well-known markup
    > language, to ensure that an isolated diacritic is centred in the
    > space between the words before and after it?

    In plain text, I think that this encoding:
        ...endOfWord1, SPACE, SPACE, diacritic, SPACE,
    is what you need, as it creates the following combining sequences:
        <...endOfWord1>, <SPACE>, <SPACE, diacritic>, <SPACE>,

    If you don't want any space around the diacritic which must be displayed
    isolated but in the middle of a word, the following would work:
        ...endOfWord1, SPACE, diacritic, startOfWord2...
    Here the SPACE is not a break opportunity, but just the base character
    for the diacritic inserted. What is missing in the standard is defining the
    property of such SPACE+diacritic sequence: normally it inherits the
    properties of the base character, and properties of diacritics are ignored.

    But when using a SPACE or NBSP base character new properties may
    be needed. If there's still a break opportunity on the base SPACE of a
    combining sequence, it is not clear where the break occurs: before the
    SPACE (i.e. before the combining sequence), or after the diacritic (i.e.
    after the combining sequence)?

    I think that the second option applies here, i.e. the base SPACE would
    create a break opportunity at end of the whole combining sequence
    made with a SPACE and the following combining characters (including
    CGJ if needed to fix canonical ordering).

    Another similar case would be the use of a isolated nukta (which
    normally modifies a following base character): the sequence
    <nukta, SPACE> is a single combining sequence with a break
    opportunity. So a sequence like <nukta, SPACE, acute accent>
    would be unbreakable but would include a break opportunity at its
    end, unless it is followed by a NBSP.
    And the sequence <nukta, NBSP, acute accent> would also be
    unbreakable either in the middle or on both ends.

    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.

    This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 19:29:22 EDT