RE: Generic base characters (was: Hebrew generic base)

From: Philippe Verdy (
Date: Thu Jul 12 2007 - 20:46:27 CDT

  • Next message: John Hudson: "Karaite manuscript (was: Phetsarat font, Lao unicode)"

    Kenneth Whistler wrote:
    > but if you are heading in that direction, why not at least investigate the
    > notion that the starting point should be more generically defined, at
    > least from the point of view of the Unicode Standard. What about just
    > looking at the generic problem as the sequence:
    > < [:Script=Common:] & [:Grapheme_Base=True:], [:gc=Mn:] >
    > That is, if you have a base character that is Common script, and you
    > follow
    > it by a non-spacing mark, a layout engine ought to render it, even if
    > not necessarily very well, regardless of the script of the non-spacing
    > mark.

    What you are trying to define here is not necessary with the existing
    standard that already allows it explicitly. Such definition will not help
    font designers and authors of renderers to support the commonly used
    combinations. So this won't change anything to the current situation.

    On the opposite, a documenting statement in the chapter describing the
    scripts, or some data in the CLDR would be helpful. We are not trying to
    define a new standard, but instead to convince these authors and designers
    to support AT LEAST the most common cases that are expected by users, not
    excluding all the MANY other possibilities that are ALREADY possible with
    the existing standard, but widely unsupported, or not supported in an
    interoperable way.

    Your definition would likely influence only authors of font renderers to
    provide a correct fallback behaviour (including with reordered multipart
    glyphs of Indic combining vowels), but not font designers that will not tune
    the correct placement of the diacritics. Imagine the effect of a dagesh
    point over a black square: shouldn't the dagesh appear at least negatively?
    Or the position of a diacritic above a symbol that is already taller than
    the x-height...

    There will be really many symbols in your set, including some that may be
    mirrored like arrows, or containing letters-like visible controls on
    terminal. Even if we choose to ignore those pathological cases that would
    likely not be used, there still remains too many symbols to support, so the
    renderer will have to find fallback symbols in other fonts which may have
    different size constraints.

    No font will be able to contain all these cases, except if the font is
    created specially to contain a very large set of "[:Script=Common:]"
    grapheme base symbols and punctuations. Now if the font is only made to
    collaborate with a renderer providing a default layout, only to tune the
    correct placement of diacritics and possible adjustment to the position of
    the base symbol (like the underscore), may be it will work correctly. That's
    something to be experimented before becoming normative.

    This archive was generated by hypermail 2.1.5 : Thu Jul 12 2007 - 21:23:34 CDT