Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Aug 09 2003 - 17:49:52 EDT

  • Next message: John Cowan: "Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)"

    On Saturday, August 09, 2003 3:11 PM, Kent Karlsson <kentk@cs.chalmers.se> wrote:

    > Michael wrote:
    > > The Name Police reject this utterly. ZERO WIDTH cannot have an
    > > expanding dynamic width.
    >
    > Then what about ZERO WIDTH SPACE, which, according to TUS3, p. 238,
    > "can grow to have a visible width when justified"? And it has the
    > NamesList comment:
    > * nominally zero width, but may expand in justification
    >
    > (But U+0082, BREAK PERMITTED HERE, which otherwise is very similar
    > to ZWSP according to 6429, does apparently not allow such
    > stretching...)
    >
    > /kent k

    - ZERO WIDTH SPACE would be good only if it had not the "Zs" general
    category which qualifies it as a whitespace, and a word breaker (in fact
    the same problem occurs with the general category offered by SPACE
    or NBSP, which is a good reason why they are highly criticizable as
    base characters for word-like sequences (even if there's a NBSP, there
    is still a word delimitation which may be important for orthographic
    and grammatical analysis, given that the main difference between SPACE
    and NBSP is mostly the line-breaking behavior but not the word-breaking
    behavior.)

    - BREAK PERMITTED HERE is a control and does not qualify as a base
    character.

    In fact, depending on the usage, the gaps to fill depend on the usage:

    1) when the isolated diacritic is to be used as a spacing symbol but which
    should not be force glued with surrounding characters, the NBSP base
    character is a problem, and in fact it also has the wrong character
    properties which normally applies to the whole combining sequence
    that should normally inherit the properties of the first base character.
    For this usage, we need something like an "INVISIBLE SYMBOL"
    base character (with gc=Sk like for other existing spacing diacritics,
    and probably with neutral directionality). The combining sequence
    will have its width adjusted to the largest diacritic(s) applied to that
    "INVISIBLE SYMBOL" base character. The nearest existing character
    to fit this function is ZWS, but it is whitespace, not symbolic.

    2) when the isolated diacritic is to be used as a regular letter within
    words (e.g.: in Traditional Hebrew), we need something like a "INVISIBLE
    LETTER" base character (with gc=Lo and neutral directionality), whose
    width is not necessarily supposed to be adjusted but may adjust depending
    depending on the left or right context (in rendering engines), so that one could
    use an isolated circumflex between each character in the pair "oo", and the
    diacritic being centered on the touching edges of each surrounding spacing
    base character, or it would create a sufficient margin on either side to make
    the isolated diacritic fit. The resulting combining sequence with the INVISIBLE
    LETTER and its non-spacing diacritics would be mostly non-spacing.
    But this rendering may be tricky to implement in many cases, and the
    renderer should be allowed to render it as a spacing diacritic, like for the
    invisible symbol, except that it would not be a symbol but really a letter that
    can fit within a word (and have applications for elided letters in the middle of
    a unbreakable word). This function is partially implementable with CGJ only
    if there's a preceding combining sequence or base letter, or by WJ (Word
    Joiner) but it is a format control and not applicable as a base character.

    For texts that want to present the isolated diacritic for its related normal
    function as a diacritic, the current best solution is to use the existing
    (spacing) dotted circle symbol as the base character. However this usage
    is quite technical, and too much Unicode related, and is not appropriate
    for all usages, where the dotted circle symbol base character may conflict
    with other usage (in a document) of this symbol (some other documents
    also prefer using for such presentation forms a gray-coloured Latin small
    letter o in some rich text like HTML or RTF, but this still has the problem
    that a rich-text format like HTML will break the plain-text into separate
    sequences, where the non-grayed diacritic muct still be rendered on top
    of this separate sequence: which base character can be used in that
    case? there's currently none, except trying with ZWS (does not work
    always), but should better be a non-spacing INVISIBLE LETTER, rather
    than a spacing INVISIBLE SYMBOL (which by itself has no defined width
    but has just a minimum width 0).

    -- 
    Philippe.
    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.
    


    This archive was generated by hypermail 2.1.5 : Sat Aug 09 2003 - 18:29:23 EDT