Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Aug 08 2003 - 10:46:03 EDT

  • Next message: Philippe Verdy: "Re: Questions on ZWNBS"

    On 04/08/2003 17:36, Kenneth Whistler wrote:
    > Peter Kirk asked:
    > > A similar issue which is not Hebrew related would be a (mythical)
    > > requirement to display a diacritic like 0315, 031B or 0322 in
    > > isolation. It would not always be appropriate to use a space or
    > > NBSP as a base character as this would indent the glyph from the
    > > beginning of a line in a way which might not be wanted. What
    > > would be the recommended encoding if one wanted to display one of
    > > these characters with no leading white space?
    > If you want to display some character like U+0315 COMBINING COMMA
    > ABOVE RIGHT *and* you want to do it is isolation *and* you want
    > it to occur at the beginning of a line *and* you want there to
    > be no display width between the margin and the left edge of the
    > display bits of the glyph, then you have stepped over the boundaries
    > of what is reasonable to expect plain text to convey. Feel free
    > to make use of the higher-level capabilities of your word
    > processor or page layout program to individually adjust the
    > positioning of particular glyphs displayed in particular fonts.

    That's true for such "defective" sequences that may be used temporarily
    during text handling operations (where the combining mark should be
    rendered in editors with the dotted circle glyph).

    But one can still represent a isolated combining character in a non
    defective way by putting it after a Zero-Width Space, without creating
    any margin. This can be done due to the Zs category of this character
    which qualifies it the same way as a ASCII SPACE would:

    0020;SPACE;Zs;0;WS;;;;;N;;;;;
    200B;ZERO WIDTH SPACE;Zs;0;BN;;;;;N;;;;;

    In fact, using ZWS may even be more accurate than using SPACE
    in bidirectional contexts, as it is bidirectionally neutral, and does not
    break directionality clusters for display reordering (so such encoded
    isolated diacritic can appear even in a RTL sequence, as if it was a
    single character with the current directionality).

    I just wonder what would be the width of the combination of ZWS plus
    a diacritic: logically the ZWS as width 0, but diacritics are supposed
    to expand, if needed the width of the base character, unless kerning
    is used to reduce the interletter spacing. But I doubt that any font
    would define a kerning pair for a preceding grapheme cluster plus
    this isolated diacritic (ZWS+combining), or for that isolated diacritic
    and the next grapheme cluster, so in absence of such kerning pair,
    most programs will just use the default combined width.

    I just tried to see how Windows XP represent the sequences:
    <A, SPACE, ZWS, COMBINING MACRON, SPACE, B>
    <A, ZWS, COMBINING MACRON, B>
    And it shows the spaces correctly even in HTML with IE6, with
    Arial, Arial Unicode MS, Times New Roman, Courier New...

    On the opposite, the sequence <SPACE, COMBINING MACRON>
    is incorrectly rendered with a too large width (larger than a single
    space or a single non-combining macron).

    Could ZWS+combining diacritic may be the best solution for
    isolated diacritics in text?



    This archive was generated by hypermail 2.1.5 : Fri Aug 08 2003 - 11:51:03 EDT