Fw: Re: PRC asking for 956 precomposed Tibetan characters

From: Chris Fynn (cfynn@gmx.net)
Date: Sun Dec 29 2002 - 11:56:48 EST

Next message: Doug Ewell: "Re: PRC asking for 956 precomposed Tibetan characters"

Previous message: Michael Everson: "Re: Coptic II?"
Reply: Doug Ewell: "Re: PRC asking for 956 precomposed Tibetan characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

----- Original Message -----
From: "Robert R. Chilton" <acip@well.com>
To: <tibex@unicode.org>
Sent: Sunday, December 29, 2002 9:34 AM
Subject: [tibex] Re: PRC asking for 956 precomposed characters

> I had heard some rumors about this proposal over the past year and I was
> interested to finally see n2558. Sadly, this proposal is flawed on many
> counts. It seems that this proposal is motivated solely by
> typographical considerations without concern for broader character data
> processing needs. Although this character set might be fine for
> computer-based typesetting of the modern Tibetan materials now being
> printed in the Peoples' Republic of China, it is somewhat lacking as a
> basis for interchange and processing of Tibetan-script data.

> Most notably this proposal represents the repertoire of a particular
> sub-language (modern Tibetan as used in the PRC) rather than a script.
> There are many examples of Tibetan-script words in classical Tibetan
> works, as well as in Dzongkha and other Tibetan-script languages of
> South Asia, that cannot be represented by this character set.

> Secondly, if the goal of this proposal was to facilitate processing of
> Tibetan-script data for purposes other than document publishing then it
> would have been more effective to provide characters for every Tibetan
> initial form (including prefix letters) rather than simply for
> typographical ligatures. The proposal as now written will result in
> unnecessary complexities in producing a culturally expected collation of
> data encoded using mixed basic Tibetan and BrdaRten characters.

> More specifically, the proposal contains some errors of fact:

> 1. The claim that "[the current Tibetan-script] encoding scheme is not
> compatible with traditional education, publication and electronic
> desktop publishing systems" is simply not true. Any system that is able
> to render other complex languages, notably Arabic and the various Indic
> and Indic-derived scripts of South and Southeast Asia, should be able to
> accommodate Tibetan-script materials encoded using the current Tibetan
> block. (It is no coincidence that the Tibetan script, which is itself
> derived from ancient Indian script, should share many structural and
> functional characteristics with modern Indic scripts.)

> It is understandable that the Chinese would like to regard Tibetan as "a
> horizontal stream of basic Tibetan characters and BrdaRten characters
> without vertical combining" since this facilitates the usage of two
> languages, namely Tibetan and Chinese, together in bi-lingual
> documents. However, this mode of thought runs counter to the very
> principle of Unicode/ISO-10646 which is to enable *any* number of
> languages to be used together, seamlessly, in documents and other
> computer applications. Would the Chinese also like to propose a set of
> precomposed characters for each of the Indic scripts so that they
> likewise can be "regarded as a horizontal stream of basic Tibetan [read
> 'basic Indic'] characters and BrdaRten [read 'precomposed Indic']
> characters without vertical combining"? Or have they resigned
> themselves (and the rest of us) to never mixing Chinese and Indic script
> within a document? On the other hand, once there is a system that can
> render Chinese together with Hindi or Tamil, rendering of Chinese
> together with Tibetan (as currently encoded) is not technically
> difficult.

> In point of fact, the cited "problems with Tibetan information
> interchange and processing" are no more difficult to solve than those
> for other complex scripts -- these having already been solved for a
> substantial number of complex scripts. The current lack of widespread
> support for Unicode Tibetan simply reflects the fact that there are
> fewer commercial and governmental resources being allocated to the
> development of Unicode Tibetan as compared to other Indic and
> Indic-derived complex scripts.

> 2. The claim that "Up to now, there is no report showing any system
> platform has implemented Tibetan processing system using dynamic
> combining method" is also untrue. Inquiries can be directed to the
> Dzongkha Development Commission in Bhutan which has overseen the
> development of just such a system platform for Dzongkha (the national
> language of Bhutan)--which is written using the letters of the Tibetan
> script.

> 3. The statement that "Since 1990s, from DOS to Windows, both domestic
> and overseas applications have been using Tibetan BrdaRten character set
> at implementation level 1. For example, the Founder desktop publishing
> system for Tibetan is based on BrdaRten characters which has become the
> de-facto industry standard for Tibetan information interchange and
> processing in China and even outside of China" is exaggerated.
> Tibetan-script computer systems have been in use in North America,
> Europe, South Asia and East Asia/Pacific Rim as early as 1986 but it is
> completely false to say that the character repertoire of n2558 has
> become "the de-facto industry standard for Tibetan information
> interchange and processing" in any place outside of the PRC. As noted
> above, the character set of n2558 does not even fully support usages of
> Tibetan script in regions outside of China. (The notation of
> "Worldwide" in question 5 of the Part C.: Technical-Justification in the
> Proposal Summary Form is thus highly misleading.)

> 4. The n2558 document asserts that "Once the Tibetan BrdaRten
> characters are encoded in BMP, many current systems supporting
> ISO/IEC10646 will enable Tibetan processing without major modification.
> Therefore, the international standard Tibetan BrdaRten characters will
> speed up the standardization and digitalization of Tibetan information,
> keep the consistency of implementation level of Tibetan and other
> scripts, develop the Tibetan culture and make the Tibetan culture
> resources shared by the world." There are a number of counter-arguments
> to these assertions:

> First, due to the limitations of the n2558 character set for
> representing classical Tibetan, Dzongkha, and other Tibetan-script
> materials it is not reasonable to expect worldwide adoption of this
> character set. Since the dynamic-combining model will continue to be
> used in South Asia (where complex-script systems are the norm), in
> academic institutions (where research in classical Tibetan is conducted)
> and elsewhere, there will always be a need to normalize Tibetan-script
> data interchanged between regions that use these two differing encoding
> models for encoding Tibetan-script data. Thus, the acceptance of this
> character set into the ISO-10646/Unicode standard will actually be an
> *obstacle* to "standardization and digitization of Tibetan information."

> Second, the reference to "consistency of implementation level of Tibetan
> and other scripts" would seem to presume that the "other scripts" in
> question are not complex scripts. This statement is simply not relevant
> when we consider the requirements of--and the already implemented
> multilingual systems for the handling of--Indic and Indic-derived
> complex scripts.

> 5. Any claims of a pre-existing "de-facto industry standard" for
> Tibetan even in China seem to be contradicted by the statement in the
> Conclusion, that "After serious discussion and analysis by Tibetan
> linguists, encoding experts and software developers in China, all are in
> favor to establish a national and international standard Tibetan
> BrdaRten character set to meet the requirement of Tibetan information
> processing." This seems to indicate that a national standard for
> Tibetan is yet to be established, even in China.

> In summary assessment, had this proposal been comprehensive enough to
> satisfy the needs of *all* users of the Tibetan-script languages and
> materials, had it taken into consideration character data processing
> needs of Tibetan beyond computerized typesetting (such as collation),
> and had it been presented ten years ago, then it might have well been
> worthy of serious consideration. As it now stands, this proposal offers
> too little too late and, moreover, would simply add further confusion
> and obstacles to the standardization of Tibetan-script data processing
> and interchange. Furthermore, even had this proposal had been presented
> for consideration ten years ago, the fact that complex-script (dynamic
> combination) rendering is needed for Indic scripts would even then have
> been a strong argument in favor of the current ISO-10646 encoding model
> and against an encoding model of the type proposed in n2558.

> Respectfully,

> Robert Chilton
> Technical Director
> The Asian Classics Input Project

=================================================

Next message: Doug Ewell: "Re: PRC asking for 956 precomposed Tibetan characters"
Previous message: Michael Everson: "Re: Coptic II?"
Reply: Doug Ewell: "Re: PRC asking for 956 precomposed Tibetan characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Dec 29 2002 - 12:30:26 EST