From: John H. Jenkins (email@example.com)
Date: Thu Nov 14 2002 - 10:49:08 EST
On Wednesday, November 13, 2002, at 12:07 AM, George W Gerrity wrote:
> In an effort to unify all character and pictographs, the decision was
> made to unify CJK characters by suppressing most variant forms. That
> turns out to be the single greatest objection from users -- especially
> Japanese -- and somehow we need a low-level way of indicating the
> target language in the context of multilingual text.
> The plane 14 tags seem to be appropriate to do this, giving a hint to
> the font engine as to a good choice of alternate glyphs, where
A couple of points.
1) There are two kinds of variant problems coming out from Unihan. The
way objections are stated based on these variant problems is,
Japanese readers will be forced to read Japanese text with Chinese
Mr. Watanabe won't be able to insert the variant glyph for his name
that he prefers into a document!
The first objection is, and always has been, a non-issue, and is the
only aspect of the problem that the Plane 14 tags could hope to deal
with. The issue is not a language one, but a locale one, to begin
with. Moreover, the typical practice in Japanese typography (at least)
is to use Japanese-preferred glyphs even when displaying Chinese text.
Japanese users do *not* expect the text to switch back-and-forth
between Chinese and Japanese glyphs as the language varies.
Given this, the best solution to the problem is to use fonts aimed at
the specific locale. This means that a Japanese user who goes to read
her email at an Internet café in Hong Kong may see things unexpectedly,
true, but it really handles 99.99+% of the problem.
I should note that as Unicode-based systems are becoming more common in
Japan, such as Windows XP and Mac OS X, there is less concern being
expressed on this point.
The second objection could not be solved by the Plane 14 tags. The two
solutions that are possible are to separately encode every glyphic
variant which someone, somewhere, sometime may find necessary to
distinguish in plain text, or to use variant markers. It is the latter
solution which the UTC has adopted.
2) From a technical standpoint, the Plane 14 tags do not really lend
themselves to use with the main complex script font engines available.
I don't know enough about Graphite to really speak to it, but in the
case of OpenType and AAT it is true that protocols are already
available to use Japanese/SC/TC/Korean/Vietnamese glyphs for a run of
text. These existing protocols, however, depending on information
external to the text itself.
To keep the information internal to the text, or, more accurately,
internal to the glyph stream, one would have to have the ability to
enter a state once a certain character (or glyph) is encountered and
remain in that state indefinitely. Neither OpenType nor AAT allow
this. OpenType does not use a state engine internal to the glyph
stream for processing, and AAT resets the state at the beginning of
What would have to happen is that the rendering engine would have to
find these characters within the text stream, massage the text data so
as remove them and mark the text with the equivalent higher-level
information, and then render the result.
The problem here is that the libraries such as Uniscribe and ATSUI
which provide Unicode rendering do not deal with the text as a whole
(at least, this is definitely true with ATSUI and is probably true with
Uniscribe, although I don't know for sure). That is, the Plane 14 tag
may be found in the first paragraph of the text, but when the client
hands the text off to the library, they may hand off only a later
portion because that's all that needs to be drawn. The library then
does not have access to this information and will not render the text
This basically means that the onus is on the client to parse the
presence of these tags in the text and make appropriate adjustments
when it hands off the text to Uniscribe or ATSUI for rendering. As
such, there is no real advantage gained by having these tags embedded
directly in the text over having them in the same layer as font, point
size, and other typographic preferences. Indeed, it becomes
inconvenient to have them in a different layer as it means that the
client has to do *two* levels of processing to derive this information,
rather than just one.
John H. Jenkins
This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 11:35:29 EST