Re: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode

From: John H. Jenkins (
Date: Wed Mar 02 2005 - 12:44:24 CST

  • Next message: Gregg Reynolds: "teh marbuta"

    On Mar 2, 2005, at 10:08 AM, wrote:

    > 10. *But* I have previously demonstrated, fairly obviously, that it
    > is hardly
    > practical for Microsoft to add long lists of OpenType "language tags"
    > for
    > something as obscure as extinct local variations of Greek script. It
    > is
    > certainly not practical for Microsoft to add lists of every of
    > possible local
    > variation of every obscure script such as Berber.

    First of all, MS doesn't own OT. It's co-owned by MS and Adobe (slight
    nit). (MS *does* own the set of language tags OT uses, however, from
    what I understand.)

    Secondly, all that's required is either for OT implementations to
    support user-defined language tags. Problem solved.

    FWIW, Apple's competing technology, AAT, *does* allow for using-defined
    font features. Thus, while AAT doesn't allow language-tagging per se,
    you can easily get the equivalent by defining your own "alternate-type
    X" feature.

    > 11. *Therefore*, some kind of "custom language tag" system is a
    > *requirement*, for Unicode to function as it is claimed it is
    > *intended* to function.
    > 12. This is not an obscure, personal desire of mine. It is an
    > essential and
    > inherent component of the approach Unicode itself has created (but
    > perhaps
    > failed to think through to its conclusion).
    > 13. Unicode has in fact created exactly this custom language tag
    > system with
    > the E0000 block. [LANGUAGE][x}[-][custom_language_name][END
    > LANGUAGE]. But
    > then this system has been "strongly disrecommended" and therefore is
    > not
    > likely to be implemented by font technologies.

    Here's a point you seem to misunderstand. The U+E0000 block language
    tags were *never* intended to be implemented by font technologies, nor
    are they really good to use with font technologies because of their
    stateful nature.

    E.g., with AAT (with which I am admittedly more familiar than OT), the
    context for a given feature never spans more than one line. If you're
    using AAT's state machine, therefore, to parse an array of glyphs to
    determine a context for a feature, that context must not cross line
    boundaries (soft or hard). I think that with OT, it's possible to have
    the context span a line break, but it's still not going to work.

    The bigger problem, after all, is that the rendering engine isn't going
    to convert the *entire* text stream to glyphs and run feature on the
    *entire* resulting glyph array to determine what to do. If you're
    looking on page 999 of a 1000-page document, that would be a lot of
    overhead, and users wouldn't stand for it. If your language tags are
    embedded in the text itself, you run the risk that this state
    information would be lost.

    This is why Unicode avoids stateful features as much as possible. The
    U+E0xxx language tags were designed for use in a protocol specializing
    in short strings where the entire string is always present (or can be
    assumed to be entirely present), so this stateful feature isn't quite
    so disastrous. For large-scale documents, however, it would be.

    > 14. THEREFORE, in order to make it actually possible to use Unicode's
    > *own*
    > stated and vigorously defended philosophy on the sole correct means of
    > accessing local script variants -- for local script variants which
    > are too
    > obscure to receive official language tags -- Unicode must do one of
    > the following:
    > A. Recommend use of, and implementation by font technology of
    > E0000
    > custom language tags (or better, add an E0000 custom script tag).
    > B. Make sure that some other higher-level "custom language tag"
    > system is
    > going to actually exist, usable in all font technologies, before
    > shifting
    > responsibility to it.
    > C. Make sure that a means of accessing generic "alternate
    > selection"
    > features in all font technologies is actually going to exist, before
    > shifting
    > responsibility to it.

    Here the cart is before the horse. Unicode has *always* made demands
    of the font technologies which support it. Many Unicode features which
    have been in the standard from the first (e.g., Indic reordering) were
    not available in widely-deployed and widely-used rendering engines at
    the time they were standardized. The UTC includes representatives of
    the companies developing the current crop of font and rendering
    technologies, and its actions are closely watched by font technology
    experts to make sure that as it grows it does so in a fashion
    compatible with the direction fonts are going in. Unicode aims to
    extend the standard so that the necessary font technology changes are
    *deployable*. It's up to the companies that develop the font
    technologies and rendering engines to actually deploy the changes.

    In this case, there are still two pieces missing for you to do what you
    want to do. One is that OpenType engines, specifically, need to have
    the ability that AAT already has—the ability to support user-defined
    tags, if only in a limited domain. The other is that the standards for
    rich-text interchange need to be extended to allow the specification of
    font features as well as fonts themselves. Neither of these is a
    Unicode issue per se.

    I realize this is frustrating for you because it sounds like everybody
    is shifting blame and responsibility elsewhere. But this really is not
    a Unicode issue.

    John H. Jenkins

    This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 12:46:17 CST