Re: The result of the plane 14 tag characters review.

From: George W Gerrity (ggerrity@dragnet.com.au)
Date: Thu Nov 14 2002 - 19:38:37 EST

  • Next message: George W Gerrity: "Re: The result of the plane 14 tag characters review."

    At 08:49 -0700 2002-11-14, John H. Jenkins wrote:
    >On Wednesday, November 13, 2002, at 12:07 AM, George W Gerrity wrote:
    >
    >>In an effort to unify all character and pictographs, the decision
    >>was made to unify CJK characters by suppressing most variant forms.
    >>That turns out to be the single greatest objection from users --
    >>especially Japanese -- and somehow we need a low-level way of
    >>indicating the target language in the context of multilingual text.
    >>
    >>The plane 14 tags seem to be appropriate to do this, giving a hint
    >>to the font engine as to a good choice of alternate glyphs, where
    >>available.
    >>
    >
    >A couple of points.
    >
    >1) There are two kinds of variant problems coming out from Unihan.
    >The way objections are stated based on these variant problems is,
    >respectively:
    >
    >Japanese readers will be forced to read Japanese text with Chinese glyphs!
    >
    >and
    >
    >Mr. Watanabe won't be able to insert the variant glyph for his name
    >that he prefers into a document!
    >
    >The first objection is, and always has been, a non-issue, and is the
    >only aspect of the problem that the Plane 14 tags could hope to deal
    >with. The issue is not a language one, but a locale one, to begin
    >with.

    Yes, although language and region can be encoded, as in en-us, or
    en-uk. The reason for providing and encoding is in multilingual
    texts, where one would hope that in each case, the rendering is
    appropriate. A good example is the production of multilingual
    manuals, which seem to be more and more common these days. I agree
    that in this example, higher-level markup would do all that is
    necessary.

    >Moreover, the typical practice in Japanese typography (at least) is
    >to use Japanese-preferred glyphs even when displaying Chinese text.
    >Japanese users do *not* expect the text to switch back-and-forth
    >between Chinese and Japanese glyphs as the language varies.

    How do Chinese feel about this? They might find it objectionable to
    have to read Chinese in Japanese glyphs in a multilingual document.

    >Given this, the best solution to the problem is to use fonts aimed
    >at the specific locale. This means that a Japanese user who goes to
    >read her email at an Internet café in Hong Kong may see things
    >unexpectedly, true, but it really handles 99.99+% of the problem.
    >
    >...
    >
    >The second objection could not be solved by the Plane 14 tags. The
    >two solutions that are possible are to separately encode every
    >glyphic variant which someone, somewhere, sometime may find
    >necessary to distinguish in plain text, or to use variant markers.
    >It is the latter solution which the UTC has adopted.
    >
    >2) From a technical standpoint, the Plane 14 tags do not really lend
    >themselves to use with the main complex script font engines
    >available. I don't know enough about Graphite to really speak to
    >it, but in the case of OpenType and AAT it is true that protocols
    >are already available to use Japanese/SC/TC/Korean/Vietnamese glyphs
    >for a run of text. These existing protocols, however, depending on
    >information external to the text itself.
    >
    >To keep the information internal to the text, or, more accurately,
    >internal to the glyph stream, one would have to have the ability to
    >enter a state once a certain character (or glyph) is encountered and
    >remain in that state indefinitely. Neither OpenType nor AAT allow
    >this. OpenType does not use a state engine internal to the glyph
    >stream for processing, and AAT resets the state at the beginning of
    >each line.

    How do they handle bidi?

    >What would have to happen is that the rendering engine would have to
    >find these characters within the text stream, massage the text data
    >so as remove them and mark the text with the equivalent higher-level
    >information, and then render the result.
    >
    >The problem here is that the libraries such as Uniscribe and ATSUI
    >which provide Unicode rendering do not deal with the text as a whole
    >(at least, this is definitely true with ATSUI and is probably true
    >with Uniscribe, although I don't know for sure). That is, the Plane
    >14 tag may be found in the first paragraph of the text, but when the
    >client hands the text off to the library, they may hand off only a
    >later portion because that's all that needs to be drawn. The
    >library then does not have access to this information and will not
    >render the text correctly.
    >
    >This basically means that the onus is on the client to parse the
    >presence of these tags in the text and make appropriate adjustments
    >when it hands off the text to Uniscribe or ATSUI for rendering. As
    >such, there is no real advantage gained by having these tags
    >embedded directly in the text over having them in the same layer as
    >font, point size, and other typographic preferences. Indeed, it
    >becomes inconvenient to have them in a different layer as it means
    >that the client has to do *two* levels of processing to derive this
    >information, rather than just one.

    Thank you. This clarification of the way the renderers work is very
    helpful in understanding why plane 14 tags are relatively useless,
    but it confuses me as to how the bidi algorithm can work: it
    certainly requires state be kept at the rendering level.

    George



    This archive was generated by hypermail 2.1.5 : Sun Nov 17 2002 - 21:03:07 EST