Re: GB 18030 Certification

From: Andrew West (andrewcwest@gmail.com)
Date: Wed Aug 24 2005 - 07:10:24 CDT

Next message: Antoine Leca: "Re: Windows Glyph Handling"

Previous message: Theo Veenker: "Re: ldml dtd"
In reply to: Christopher Fynn: "Re: GB 18030 Certification"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 22/08/05, Christopher Fynn <cfynn@gmx.net> wrote:
>
>
> Andrew West wrote:
>
> > If support for the Chinese "Set A" set of precomposed Tibetan stacks
> > is now a requirement for GB18030 certification, then I would have
> > thought that OpenType Tibetan fonts such as Xiamalaya and Tibetan
> > Machine Uni that already fully support Unicode Tibetan by means of
> > OpenType tables could be made GB18030 compliant by adding in extra
> > mappings from the PUA codepoints defined in Set A to the appropriate
> > glyphs in the font where available or by decomposing the PUA code
> > points using OpenType features.
>
> A single set of glyphs is fine but the lookups could be very complicated.
> Unless you always perform some kind of "normalization", if a single
> document is edited on diverse systems you could end up with something in
> kind of a mixed (partly pre-composed and partly "atomic" Unicode)
> encoding - or something in between. What happens when you add a single
> combining consonant to a precomposed consonant stack?
>

It wouldn't work, as standard Unicode Tibetan does not recognise the
precomposed Tibetan stacks in the PUA, and the Chinese Tibetan system
does not recognise the Tibetan combining consonants.

> Without normalization of some kind the font lookup tables needed
> to handle every possible way of encoding each stack could quickly
> become unmanagable and difficult to debug.

There is no need to to handle "illegal" combinations of precomposed
stacks and combining Tibetan consonants and/or vowels.

>
> I guess MS Windows at least will try to map every
>
> The PRC's precomposed / PUA encoding of Tibetan seems be designed to
> avoid the need for anything like OpenType shaping or "smart font"
> technology. Since they are used to huge CJK character sets and fonts,
> 6,000+ pre-composed Tibetan "characters" may seem to make more sense to
> them than adding support for "smart" fonts and complex script shaping.

The odd thing is that the Chinese can't get away from smart font
technology such as OpenType, as such technology is needed anyway in
order to render other minority and historical scripts in the Chinese
domain, such as Uighur, Mongolian and Phags-pa. Sooner or later
they're going to have to bite the bullet, and use OpenType like
everyone else in the world. I still believe that the Chinese PUA
Tibetan system is a stopgap solution that will eventually become
redundant when OpenType technology becomes accepted by the Chinese.

>
> IMO assigning PUA mappings to pre-composed combinations in existing OT
> fonts is not a good idea as it might only encourage the creation of
> documents with mixed encoding.

The Chinese system is inherently a "mixed encoding", as it utilises
both the non-combining letters in the Tibetan block [0F00..0FFF] as
well as precomposed stacks in the PUA and Supplementary PUA. So under
the Chinese scheme a Tibetan word such as BDAMS is represented
entirely using standard Tibetan characters <0F56 0F51 0F58 0F66>. In
fact, my understanding is that the Chinese model allows for two
implemenational levels, one that only supports the precomposed stacks
in the PUA and the non-combining characters in the Tibetan block, and
one that supports both precomposed Tibetan and standard combining
Tibetan consonants and vowels. The higher implementation level is
needed to render less common Tibetan stacks that have not been
assigned PUA code points by the Chinese.

>
> > In principle, it should be fairly
> > straightforward to support both encoding mechanisms in a single
> > OpenType font using a single set of glyphs.
>
> You'd still need support for OT shaping which is what such encoding
> schemes seem designed to avoid.

Indeed. But that is what is needed if an OT Tibetan font is required
to support the PUA system of precomposed Tibetan as well as proper
Unicode Tibetan.

Andrew

Next message: Antoine Leca: "Re: Windows Glyph Handling"
Previous message: Theo Veenker: "Re: ldml dtd"
In reply to: Christopher Fynn: "Re: GB 18030 Certification"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 07:11:37 CDT