From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Sat Jan 08 2005 - 05:13:52 CST
On Fri, 07 Jan 2005 16:01:56 +0000, Christopher Fynn wrote:
>
> Andrew C. West wrote:
>
> > Of course if you then want to treat these PUA characters as real Unicode
> Tibetan
> > you need to know the character mapping, but from my perspective character
> > mapping is something that is optionally applied on top of the code point
> > mapping.
>
> As soon as you want to edit the text in a Unicode based application
> you'd probably need to convert (or "character map") the BrdaRten PUA
> characters to "real Unicode" [or you might end up with the horrors of a
> kind of mixed encoding]. Comparing text from "real Unicode" with
> precomposed Tibetan (PUA or GB18030), and collation would be difficult
> without conversion as well.
Personally, I think that that is precicely not what a GB18030-supporting Unicode
application would want to do. The whole point of China defining a large set of
precomposed Tibetan characters is to enable the display of Tibetan text using
simple font and rendering technology (i.e. not resorting to OpenType etc.). Any
font created to display BrdaRten characters would have precomposed Tibetan
glyphs mapped to the PUA (F300..F8FF for Set A, somewhere in Plane 16 for Set B
which I think is not yet fully defined) and basic Tibetan glyphs mapped to the
0F00..0FFF (excluding the vowels and subjoined consonants which are not used in
the BrdaRten model). If an application opens a GB18030 document containing
BrdaRtren text and then automatically converts it to decomposed Tibetan, then
the document will be unreadable to the user with only a BrdaRten font. Therefore
the BrdaRten text must be kept as PUA characters in order to be displayed with a
BrdaRten font, and you would only want to convert them to decomposed Tibetan if
the user specifically requests it.
As you say, for operations such as collation and comparison you would need to
convert "Unicode Tibetan" and "BrdaRten Tibetan" to a common encoding, but that
is probably not something that most BrdaRten users will want to do. As to the
problems of "mixed encoding", it would be up to the end user to ensure that he
uses an input method to write Tibetan that generates BrdaRten characters and not
decomposed Tibetan. Anyway, the BrdaRten "standard" explicitly allows for mixed
encoding, specifying two levels of support : Level 1 - supporting precomposed
Tibetan only; and Level 2 - supporting precomposed Tibetan and decomposed
Tibetan.
It is also worthwhile pointing out that a lot of education about Unicode Tibetan
and OpenType technology is taking place both within Tibet and China and at
places such as the University of Virginia which has many visiting scholars from
Tibet. And as Chris has pointed out elsewhere, a recent study by Chinese
academics has confirmed the feasibility of the Unicode Tibetan encoding model in
conjunction with OpenType font technology (something that we knew all along, but
it is good to see the Chinese beginning to realise that OpenType is not
something to be scared of). My feeling is that with the current proliferation of
working Tibetan OpenType fonts Tibetan users in China will soon move away from
the precomposed Tibetan model, and BrdaRten will be effectively dead before it
has been fully defined.
Andrew
This archive was generated by hypermail 2.1.5 : Sat Jan 08 2005 - 05:15:52 CST