Re: GB18030 mapping

From: Christopher Fynn (cfynn@gmx.net)
Date: Sat Jan 08 2005 - 11:26:02 CST

Next message: Antoine Leca: "Re: ISO 10646 compliance and EU law"

Previous message: Andrew C. West: "GB18030 mapping (was Re: ISO 10646 compliance and EU law )"
In reply to: Andrew C. West: "Re: GB18030 mapping"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andrew C. West wrote:

> Personally, I think that that is precicely not what a GB18030-supporting Unicode
> application would want to do. The whole point of China defining a large set of
> precomposed Tibetan characters is to enable the display of Tibetan text using
> simple font and rendering technology (i.e. not resorting to OpenType etc.). Any
> font created to display BrdaRten characters would have precomposed Tibetan
> glyphs mapped to the PUA (F300..F8FF for Set A, somewhere in Plane 16 for Set B
> which I think is not yet fully defined) and basic Tibetan glyphs mapped to the
> 0F00..0FFF (excluding the vowels and subjoined consonants which are not used in
> the BrdaRten model). If an application opens a GB18030 document containing
> BrdaRtren text and then automatically converts it to decomposed Tibetan, then
> the document will be unreadable to the user with only a BrdaRten font. Therefore
> the BrdaRten text must be kept as PUA characters in order to be displayed with a
> BrdaRten font, and you would only want to convert them to decomposed Tibetan if
> the user specifically requests it.

However fonts can be built which support *both*:- you can make a font
with all the pre-composed glyphs mapped to the PUA and GB18030 code
points *and* lookup tables that can map sequences of Unicode characters
to the precomposed Tibetan glyphs. Since font developers naturally want
their fonts to work on the widest range of systems possible it seems
likely that some developers of Tibetan fonts will do this.

Trouble is such fonts allow you to create documents with a "mixed
encoding" which is very messy.

> As you say, for operations such as collation and comparison you would need to
> convert "Unicode Tibetan" and "BrdaRten Tibetan" to a common encoding, but that
> is probably not something that most BrdaRten users will want to do. As to the
> problems of "mixed encoding", it would be up to the end user to ensure that he
> uses an input method to write Tibetan that generates BrdaRten characters and not
> decomposed Tibetan. Anyway, the BrdaRten "standard" explicitly allows for mixed
> encoding, specifying two levels of support : Level 1 - supporting precomposed
> Tibetan only; and Level 2 - supporting precomposed Tibetan and decomposed
> Tibetan.

Most of the time the end user will only care about what he/she sees on
the screen and what comes out of the printer. It's when they try to use
data like this in applications that only support GB18030 or apply
particular properties to certain PUA characters (we already know that
there are many applications which do this) or when they try to search
/replace text and so on that the problems begin - and they probably
won't know why.

IMO this is also a mess for application developers who support Unicode
but also need to support GB18030 for the Chinese market

> It is also worthwhile pointing out that a lot of education about
Unicode Tibetan
> and OpenType technology is taking place both within Tibet and China
and at
> places such as the University of Virginia which has many visiting
scholars from
> Tibet. And as Chris has pointed out elsewhere, a recent study by Chinese
> academics has confirmed the feasibility of the Unicode Tibetan
encoding model in
> conjunction with OpenType font technology (something that we knew all
along, but
> it is good to see the Chinese beginning to realise that OpenType is not
> something to be scared of). My feeling is that with the current
proliferation of
> working Tibetan OpenType fonts Tibetan users in China will soon move
away from
> the precomposed Tibetan model, and BrdaRten will be effectively dead
before it
> has been fully defined.

> Andrew

Yes before Xmas I was at the University of Virginia showing three
Tibetans visiting from China how convert their fonts to OpenType :-).
One good thing, Tibetans seem to be very interested in being able to use
cursive Tibetan script fonts. Since Unicode and OpenType make it
possible to display the contextual glyph shapes required to render
cursive Tibetan properly, while a pre-composed encoding with 1-to-1
character to glyph mapping doesn't handle this - it was pretty easy to
come up with a compelling demonstration. It may be the fact that they
can render cursive Tibetan properly that convinces Tibetan users to use
"pure" Unicode rather than GB18030 or some kind of hybrid.

- Chris

Next message: Antoine Leca: "Re: ISO 10646 compliance and EU law"
Previous message: Andrew C. West: "GB18030 mapping (was Re: ISO 10646 compliance and EU law )"
In reply to: Andrew C. West: "Re: GB18030 mapping"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Jan 08 2005 - 11:29:52 CST