RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 11 2004 - 14:04:05 CDT

  • Next message: Kenneth Whistler: "Re: Medieval CJK race-horse names (was Re: Bantu click letters )"

    > Peter Constable wrote,
    >
    > > Don't forget canonical equivalence (I forgot about this as well): the
    > > double-width diacritics have a combining class of 234 rather than 230.
    > > This means that 0251 0361 0302 028A is canonically equivalent to 0251
    > > 0302 0361 028A. Therefore, the first (for better or worse) should appear
    > > just the way Doulos SIL renders it.

    > Sure enough! Thanks. I didn't even think to check the combining class,
    > both were marks above.
    >
    > Doesn't this mean that it isn't possible to stack a combining circumflex
    > above a combining spanning inverted breve? Does this mean we'd need
    > double-wide clones of all the combining marks in order to support such
    > combos?

    Actually, no. The UTC has had discussions about this. The whole
    issue of how to display accents *above* a combining double diacritic
    (or for that matter *below* a combining double diacritic below)
    was debated at some length on the list last year -- I expect that
    a search of the archives would turn it up.

    In any case, the addition of U+034F COMBINING GRAPHEME JOINER,
    and the recent refinement of the definition of combining character
    sequence to explicitly allow ZWJ and ZWNJ, gives you a text
    mechanism for blocking what would otherwise result in a canonical
    reordering for such sequences.

    Thus:

    <0251, 0361, 0302, 028A>

    is canonically equivalent to:

    <0251, 0302, 0361, 028A>

    and both should result in the same display, with the circumflex
    over the "a" and the ligature tie spanning both base characters,
    *over* the circumflex.

    But:

    <0251, 0361, 034F, 0302, 028A> or
    <0251, 0361, 200D, 0302, 028A>

    are *not* canonically equivalent to:

    <0251, 0302, 034F, 0361, 028A> or
    <0251, 0302, 200D, 0361, 028A>

    And they should, in principle, at least, result in a display with
    the circumflex positioned *above* the ligature tie and with respect
    to it, rather than above the "a" and with respect to it.

    This is the same principle which is being used to enable textual
    distinctions for certain combinations of Hebrew points and accents,
    for example, which would otherwise be reordered into
    undesirable orders by any normalization process.

    Whether any existing rendering engine will do a decent job of
    implementing that, I don't actually know.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jun 11 2004 - 14:04:43 CDT