Re: GB 18030 Certification

From: Andrew West (andrewcwest@gmail.com)
Date: Wed Aug 24 2005 - 07:10:24 CDT

  • Next message: Antoine Leca: "Re: Windows Glyph Handling"

    On 22/08/05, Christopher Fynn <cfynn@gmx.net> wrote:
    >
    >
    > Andrew West wrote:
    >
    > > If support for the Chinese "Set A" set of precomposed Tibetan stacks
    > > is now a requirement for GB18030 certification, then I would have
    > > thought that OpenType Tibetan fonts such as Xiamalaya and Tibetan
    > > Machine Uni that already fully support Unicode Tibetan by means of
    > > OpenType tables could be made GB18030 compliant by adding in extra
    > > mappings from the PUA codepoints defined in Set A to the appropriate
    > > glyphs in the font where available or by decomposing the PUA code
    > > points using OpenType features.
    >
    > A single set of glyphs is fine but the lookups could be very complicated.
    > Unless you always perform some kind of "normalization", if a single
    > document is edited on diverse systems you could end up with something in
    > kind of a mixed (partly pre-composed and partly "atomic" Unicode)
    > encoding - or something in between. What happens when you add a single
    > combining consonant to a precomposed consonant stack?
    >

    It wouldn't work, as standard Unicode Tibetan does not recognise the
    precomposed Tibetan stacks in the PUA, and the Chinese Tibetan system
    does not recognise the Tibetan combining consonants.

    > Without normalization of some kind the font lookup tables needed
    > to handle every possible way of encoding each stack could quickly
    > become unmanagable and difficult to debug.

    There is no need to to handle "illegal" combinations of precomposed
    stacks and combining Tibetan consonants and/or vowels.

    >
    > I guess MS Windows at least will try to map every
    >
    > The PRC's precomposed / PUA encoding of Tibetan seems be designed to
    > avoid the need for anything like OpenType shaping or "smart font"
    > technology. Since they are used to huge CJK character sets and fonts,
    > 6,000+ pre-composed Tibetan "characters" may seem to make more sense to
    > them than adding support for "smart" fonts and complex script shaping.

    The odd thing is that the Chinese can't get away from smart font
    technology such as OpenType, as such technology is needed anyway in
    order to render other minority and historical scripts in the Chinese
    domain, such as Uighur, Mongolian and Phags-pa. Sooner or later
    they're going to have to bite the bullet, and use OpenType like
    everyone else in the world. I still believe that the Chinese PUA
    Tibetan system is a stopgap solution that will eventually become
    redundant when OpenType technology becomes accepted by the Chinese.

    >
    > IMO assigning PUA mappings to pre-composed combinations in existing OT
    > fonts is not a good idea as it might only encourage the creation of
    > documents with mixed encoding.

    The Chinese system is inherently a "mixed encoding", as it utilises
    both the non-combining letters in the Tibetan block [0F00..0FFF] as
    well as precomposed stacks in the PUA and Supplementary PUA. So under
    the Chinese scheme a Tibetan word such as BDAMS is represented
    entirely using standard Tibetan characters <0F56 0F51 0F58 0F66>. In
    fact, my understanding is that the Chinese model allows for two
    implemenational levels, one that only supports the precomposed stacks
    in the PUA and the non-combining characters in the Tibetan block, and
    one that supports both precomposed Tibetan and standard combining
    Tibetan consonants and vowels. The higher implementation level is
    needed to render less common Tibetan stacks that have not been
    assigned PUA code points by the Chinese.

    >
    > > In principle, it should be fairly
    > > straightforward to support both encoding mechanisms in a single
    > > OpenType font using a single set of glyphs.
    >
    > You'd still need support for OT shaping which is what such encoding
    > schemes seem designed to avoid.

    Indeed. But that is what is needed if an OT Tibetan font is required
    to support the PUA system of precomposed Tibetan as well as proper
    Unicode Tibetan.

    Andrew



    This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 07:11:37 CDT