Re: But E0000 Custom Language Tags Are Actually Required For Use By Unicode

From: John H. Jenkins (jenkins@apple.com)
Date: Wed Mar 02 2005 - 12:44:24 CST

Next message: Gregg Reynolds: "teh marbuta"

Previous message: Peter Kirk: "Re: Unicode Stability"
In reply to: UList@dfa-mail.com: "But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode"
Next in thread: Peter Constable: "RE: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Mar 2, 2005, at 10:08 AM, UList@dfa-mail.com wrote:

> 10. *But* I have previously demonstrated, fairly obviously, that it
> is hardly
> practical for Microsoft to add long lists of OpenType "language tags"
> for
> something as obscure as extinct local variations of Greek script. It
> is
> certainly not practical for Microsoft to add lists of every of
> possible local
> variation of every obscure script such as Berber.
>

First of all, MS doesn't own OT. It's co-owned by MS and Adobe (slight
nit). (MS *does* own the set of language tags OT uses, however, from
what I understand.)

Secondly, all that's required is either for OT implementations to
support user-defined language tags. Problem solved.

FWIW, Apple's competing technology, AAT, *does* allow for using-defined
font features. Thus, while AAT doesn't allow language-tagging per se,
you can easily get the equivalent by defining your own "alternate-type
X" feature.

> 11. *Therefore*, some kind of "custom language tag" system is a
> *requirement*, for Unicode to function as it is claimed it is
> *intended* to function.
>
> 12. This is not an obscure, personal desire of mine. It is an
> essential and
> inherent component of the approach Unicode itself has created (but
> perhaps
> failed to think through to its conclusion).
>
> 13. Unicode has in fact created exactly this custom language tag
> system with
> the E0000 block. [LANGUAGE][x}[-][custom_language_name][END
> LANGUAGE]. But
> then this system has been "strongly disrecommended" and therefore is
> not
> likely to be implemented by font technologies.
>

Here's a point you seem to misunderstand. The U+E0000 block language
tags were *never* intended to be implemented by font technologies, nor
are they really good to use with font technologies because of their
stateful nature.

E.g., with AAT (with which I am admittedly more familiar than OT), the
context for a given feature never spans more than one line. If you're
using AAT's state machine, therefore, to parse an array of glyphs to
determine a context for a feature, that context must not cross line
boundaries (soft or hard). I think that with OT, it's possible to have
the context span a line break, but it's still not going to work.

The bigger problem, after all, is that the rendering engine isn't going
to convert the *entire* text stream to glyphs and run feature on the
*entire* resulting glyph array to determine what to do. If you're
looking on page 999 of a 1000-page document, that would be a lot of
overhead, and users wouldn't stand for it. If your language tags are
embedded in the text itself, you run the risk that this state
information would be lost.

This is why Unicode avoids stateful features as much as possible. The
U+E0xxx language tags were designed for use in a protocol specializing
in short strings where the entire string is always present (or can be
assumed to be entirely present), so this stateful feature isn't quite
so disastrous. For large-scale documents, however, it would be.

>
> 14. THEREFORE, in order to make it actually possible to use Unicode's
> *own*
> stated and vigorously defended philosophy on the sole correct means of
> accessing local script variants -- for local script variants which
> are too
> obscure to receive official language tags -- Unicode must do one of
> the following:
>
> A. Recommend use of, and implementation by font technology of
> E0000
> custom language tags (or better, add an E0000 custom script tag).
>
> B. Make sure that some other higher-level "custom language tag"
> system is
> going to actually exist, usable in all font technologies, before
> shifting
> responsibility to it.
>
> C. Make sure that a means of accessing generic "alternate
> selection"
> features in all font technologies is actually going to exist, before
> shifting
> responsibility to it.
>

Here the cart is before the horse. Unicode has *always* made demands
of the font technologies which support it. Many Unicode features which
have been in the standard from the first (e.g., Indic reordering) were
not available in widely-deployed and widely-used rendering engines at
the time they were standardized. The UTC includes representatives of
the companies developing the current crop of font and rendering
technologies, and its actions are closely watched by font technology
experts to make sure that as it grows it does so in a fashion
compatible with the direction fonts are going in. Unicode aims to
extend the standard so that the necessary font technology changes are
*deployable*. It's up to the companies that develop the font
technologies and rendering engines to actually deploy the changes.

In this case, there are still two pieces missing for you to do what you
want to do. One is that OpenType engines, specifically, need to have
the ability that AAT already has—the ability to support user-defined
tags, if only in a limited domain. The other is that the standards for
rich-text interchange need to be extended to allow the specification of
font features as well as fonts themselves. Neither of these is a
Unicode issue per se.

I realize this is frustrating for you because it sounds like everybody
is shifting blame and responsibility elsewhere. But this really is not
a Unicode issue.

========
John H. Jenkins
jenkins@apple.com
jhjenkins@mac.com
http://homepage.mac.com/jhjenkins/

Next message: Gregg Reynolds: "teh marbuta"
Previous message: Peter Kirk: "Re: Unicode Stability"
In reply to: UList@dfa-mail.com: "But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode"
Next in thread: Peter Constable: "RE: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 12:46:17 CST

Re: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode

Re: But E0000 Custom Language Tags Are Actually Required For Use By Unicode