Re: glyph selection for Unicode in browsers

From: Peter_Constable@sil.org
Date: Wed Sep 25 2002 - 17:09:31 EDT

  • Next message: jarkko.hietaniemi@nokia.com: "RE: glyph selection for Unicode in browsers"

    On 09/25/2002 01:51:28 PM Tex Texin wrote:

    >a) Do Unicode fonts include the language-based glyph variants of
    >characters, so that a display system is capable of identifying or
    >hinting which glyph should be used in a particular scenario?

    They *can*, and some do. When this is the case, then there needs to be some
    mechanism to modify the relationship between sequences of characters and
    sequences of glyphs to arrive at the particular glyphs intended for the
    given language. In general terms, the same kinds of mechanisms than can be
    used for rendering complex scripts can also be used here -- it's a glyph
    substitution, comparable to substituting an initial or final form of an
    Arabic character. Of course, there is a different triggering condition
    involved in these situations than in the case of a complex script such as
    Arabic: in the complex-script situation, the triggers are the character
    context (e.g. preceded by non-word-forming character and followed by
    word-forming character), whereas here the trigger is a metadata tag.

    Let's consider how this would be dealt with in term of implementation,
    using OpenType as an example. The OpenType font format provides means for
    storing different glyph-transformation rules according to "language". (1)
    The question is, then, what does it take for the rendering process to make
    use of one set of language-specific rules rather than another, or rather
    than a set of default rules (OT allows the font developer to specify a
    default). In OpenType, glyph-transformation rules are grouped by
    "features", and a set of rules will be applied when the associated feature
    has been activated. (Thus, in OT text layout, what's processed is a
    feature-marked-up string of characters.) This applies to the "language"
    distinctions as well: the desired "language" must be specified in the
    input, otherwise the default rules will apply. (2) The idea is that
    application software must determine what features are activated at what
    point.

    Now, hardly any software gets written to interact directly with the
    OpenType layout engine. Instead, higher-level text layout libraries have
    been written that wrap the OpenType functionality. Uniscribe is one
    example; indeed, in Win32 on Windows 2000 and later, there is even another
    layer, since the standard text-drawing functions (TextOut and ExtTextOut)
    wrap Uniscribe's functionality. Other examples of libaries that wrap up the
    OT interface and expose a higher-level interface include Adobe's CoolType
    engine (not a published interface, that I know of), ICU, Pango and Sun's
    recent Standard Type Services Framework project.

    So, at the OT interface, a "language" tag (3) has to be specified in order
    to get language-specific glyphs. But apps generally don't write to that
    interface (for good reason); they usually write to a higher interface. The
    crux of the issue is that none of the higher-level interfaces, that I know
    of, yet provide any mechanism for the app to specify a "language" tag. (4)
    Hence, the building blocks are there, but more infrastructure is still
    needed. Note that there's a bit more involved that simply re-writing
    higer-level APIs to expose a way to specify OT featues. In particular, a
    critical issue has to do with the relationship between OpenType's
    "language" tags, and whatever system of "language" or "locale" tagging
    might be used elsewhere in a given platform.

    I've described the situation in terms of OpenType. Neither AAT or Graphite
    provide exactly the same kind of mechanism for providing different glyph
    transformations for different languages, though I believe some
    consideration has been given to possibilities for both technologies. Both
    use feature mechanisms, so can certainly do what you're looking for; but
    neither has specifically defined features specifically related to
    "languages", let alone decided how these should be handled in terms of
    APIs. It would be possible to implement an AAT or Graphite font that used a
    feature to get at language-specific glyphs, and apps that exposed a
    user-interface for setting AAT or Graphite features (5) would offer the
    user a way to control this. But there would not be any automation whereby
    an app would specify this based on other "language" or "locale" tagging.

    Notes:

    (1) I put "language" in quotation marks since it has not really been
    adequately worked out what these distinctions are; I think these are
    probably groups of writing systems.

    (2) OpenType glyph-transformation rules are organised hierarchically, first
    by script, then by language, and then according to the other features they
    are associated with.

    (3) OpenType's "language" tags have no specified relationship with ISO 639,
    RFC 3066 or any other system of "language" tags.

    (4) The same issue applies to OpenType features that pertain to optional
    aspects of typography and rendering that are up to the user's discretion
    rather than being obligatory behaviour for a script. For instance, there is
    an OpenType feature for selecting small cap forms, which a font developer
    can use to provide support for small cap glyphs in the same font as regular
    glyphs. To make use of such advanced capabilities, the layout interface to
    which an app is written must provide a way to specify such features. Apart
    from Adobe's engine (used e.g. by InDesign, and which exposes interfaces
    for some OpenType features but, I think, not all), I don't know that any
    other layout library yet provides an interface that allows an app to
    specify discretionary OpenType features.

    (5) In both AAT and Graphite, features are used only for discretionary
    aspects of typography / rendering that are not obligatory for a script,
    whereas OpenType uses features for both optional and obligatory behaviours.
    Thus, for AAT and Graphite, the feature capabilities have always assumed
    that apps would provide a user interface whereby the user can set features.
    (In OpenType, this makes sense for some but not all features.)
    Language-specific typography represents something different from both
    obligatory script behaviour and user-preference typography: it would
    probably be suitable for automation (i.e. the app uses metadata to
    determine via an appropriate API language-specify glyph transformations)
    rather than controlling via a user interface. For that reason, it's not
    clear to me that this should be handled as just one more kind of feature in
    the AAT and Graphite models.

    >b) If the above is possible, then I assume the browsers have not
    >implemented language-based selection yet.

    Still possible basically only in theory (or else with a lot of work to also
    re-implement the capabilities of something like Uniscribe), so no browsers
    yet implement this.

    >Are any browsers moving to
    >using the appropriate glyphs based on language without depending on each
    >language being assigned a different font?

    Probably not yet.

    >c) If the above is not possible, then configuring browsers for Unicode
    >usage is greatly complicated by the need to have a lengthy list of fonts
    >assigned to different languages.

    Um hmm.

    >Is there an alternative approach that
    >can be used, so users can easily view Unicode text and get the correct
    >display while using a single "Unicode" font?

    This is another big question, and I've said lots already. I'll just mention
    techniques known as "font-fallback", "font fixup" or "font-linking" -- all
    variations on the idea that if the text is supposed to be rendered using
    font X, but that font doesn't have glyphs to support the characters in the
    string, then figure out what fonts *will* support those characters and use
    those, in spite of what the style properties specify. I don't know that
    this kind of thing has been used to provide language-specific glyphs;
    usually, it has been viewed as a way to keep the user from seeing boxes (or
    other comparable notdef glyphs).

    You've certainly touched on topics that are both interesting and important.
    I'll leave the remaining questions for someone else.

    - Peter

    ---------------------------------------------------------------------------
    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485
    E-mail: <peter_constable@sil.org>



    This archive was generated by hypermail 2.1.5 : Wed Sep 25 2002 - 18:04:56 EDT