Re: Script of U+0951 .. U+0954

From: Antoine LECA (
Date: Thu Dec 05 2002 - 14:13:10 EST

  • Next message: "Re: Script of U+0951 .. U+0954"

    Peter Constable wrote:
    > There is a potential concern in Uniscribe/OpenType: substitution and
    > positioning rules in OT are organised hierarchically by script then by
    > individual writing system / typographic groups (the label used is
    > languages, but the intent is really groups of writing systems that share
    > common typographic behaviours). Thus, a rule that handles positioning of a
    > glyph for 0950 (or whatever) relative to some member of some class of
    > glyphs must be entered somewhere under some particular script. Now, there
    > is nothing that prohibits a font developer from creating multiple
    > positioning rules for 0950 with different classes of base glyphs and to
    > have a different one placed in the hierarchy under several different
    > scripts.

    Fully agreed so far.

    > But there may yet be an issue on the Uniscribe side: given a
    > string of characters, which it will begin by mapping into a string of
    > initial glyphs, it has to decide which script tag(s) to apply to portions
    > of the string. What I don't know is whether it generally assumes combining
    > marks belong to a specific script, or whether it allows combining marks to
    > inherit their script from the base characters with which they combine.

    Look: in current Uniscribe, leading ZWJ and ZWNJ are discarded (i.e., with
    input U+200B U+093E, you still get the circle meaning "incorrect combining",
    even if this is perfectly correct Unicode as far as I understand.
    So clearly, they have a problem with "backtracking" when the script is
    not determined by the first character in stream. I can understand that.
    OTOH, when ZWJ or ZWNJ come second or later in conjuncts, they are properly
    handled. In every script it is relevant. What I would like to see, is that
    the Indic accents be handled in the same way. And when I spoke about that
    with MS people (and not only me, but also Pothana's designer), MS answered
    that the Unicode standard seemed to imply that these accents apply to
    Devanagari script only.
    It looks like to me taht this Scripts.txt just confirm the MS point of view.
    If this is as intended, that is fine, but that means that a bunch of new
    character (with few or no added value) are to be added to some new revision
    of Unicode.

    By the way, the situation is similar with the dandas (U+0964 and U+0965):
    they only appear in the Devanagari and Myanmar blocks, but are used for many
    other (all?) South-Asian scripts as well. Worse, they are often used, so
    there is already many material that is encoded with these codepoints.
    Luckily, dandas do not need special handling from complex script engines,
    so it does not matter if Uniscribe decide they are Devanagri or script-less
    (except perhaps on the selection of the font).


    This archive was generated by hypermail 2.1.5 : Thu Dec 05 2002 - 14:52:39 EST