Fw: Re: PRC asking for 956 precomposed Tibetan characters

From: Chris Fynn (cfynn@gmx.net)
Date: Sun Dec 29 2002 - 11:56:48 EST

  • Next message: Doug Ewell: "Re: PRC asking for 956 precomposed Tibetan characters"

    ----- Original Message -----
    From: "Robert R. Chilton" <acip@well.com>
    To: <tibex@unicode.org>
    Sent: Sunday, December 29, 2002 9:34 AM
    Subject: [tibex] Re: PRC asking for 956 precomposed characters

    > I had heard some rumors about this proposal over the past year and I was
    > interested to finally see n2558. Sadly, this proposal is flawed on many
    > counts. It seems that this proposal is motivated solely by
    > typographical considerations without concern for broader character data
    > processing needs. Although this character set might be fine for
    > computer-based typesetting of the modern Tibetan materials now being
    > printed in the Peoples' Republic of China, it is somewhat lacking as a
    > basis for interchange and processing of Tibetan-script data.
     
    > Most notably this proposal represents the repertoire of a particular
    > sub-language (modern Tibetan as used in the PRC) rather than a script.
    > There are many examples of Tibetan-script words in classical Tibetan
    > works, as well as in Dzongkha and other Tibetan-script languages of
    > South Asia, that cannot be represented by this character set.
     
    > Secondly, if the goal of this proposal was to facilitate processing of
    > Tibetan-script data for purposes other than document publishing then it
    > would have been more effective to provide characters for every Tibetan
    > initial form (including prefix letters) rather than simply for
    > typographical ligatures. The proposal as now written will result in
    > unnecessary complexities in producing a culturally expected collation of
    > data encoded using mixed basic Tibetan and BrdaRten characters.
     
    > More specifically, the proposal contains some errors of fact:
     
    > 1. The claim that "[the current Tibetan-script] encoding scheme is not
    > compatible with traditional education, publication and electronic
    > desktop publishing systems" is simply not true. Any system that is able
    > to render other complex languages, notably Arabic and the various Indic
    > and Indic-derived scripts of South and Southeast Asia, should be able to
    > accommodate Tibetan-script materials encoded using the current Tibetan
    > block. (It is no coincidence that the Tibetan script, which is itself
    > derived from ancient Indian script, should share many structural and
    > functional characteristics with modern Indic scripts.)
     
    > It is understandable that the Chinese would like to regard Tibetan as "a
    > horizontal stream of basic Tibetan characters and BrdaRten characters
    > without vertical combining" since this facilitates the usage of two
    > languages, namely Tibetan and Chinese, together in bi-lingual
    > documents. However, this mode of thought runs counter to the very
    > principle of Unicode/ISO-10646 which is to enable *any* number of
    > languages to be used together, seamlessly, in documents and other
    > computer applications. Would the Chinese also like to propose a set of
    > precomposed characters for each of the Indic scripts so that they
    > likewise can be "regarded as a horizontal stream of basic Tibetan [read
    > 'basic Indic'] characters and BrdaRten [read 'precomposed Indic']
    > characters without vertical combining"? Or have they resigned
    > themselves (and the rest of us) to never mixing Chinese and Indic script
    > within a document? On the other hand, once there is a system that can
    > render Chinese together with Hindi or Tamil, rendering of Chinese
    > together with Tibetan (as currently encoded) is not technically
    > difficult.
     
    > In point of fact, the cited "problems with Tibetan information
    > interchange and processing" are no more difficult to solve than those
    > for other complex scripts -- these having already been solved for a
    > substantial number of complex scripts. The current lack of widespread
    > support for Unicode Tibetan simply reflects the fact that there are
    > fewer commercial and governmental resources being allocated to the
    > development of Unicode Tibetan as compared to other Indic and
    > Indic-derived complex scripts.
     
    > 2. The claim that "Up to now, there is no report showing any system
    > platform has implemented Tibetan processing system using dynamic
    > combining method" is also untrue. Inquiries can be directed to the
    > Dzongkha Development Commission in Bhutan which has overseen the
    > development of just such a system platform for Dzongkha (the national
    > language of Bhutan)--which is written using the letters of the Tibetan
    > script.
     
    > 3. The statement that "Since 1990s, from DOS to Windows, both domestic
    > and overseas applications have been using Tibetan BrdaRten character set
    > at implementation level 1. For example, the Founder desktop publishing
    > system for Tibetan is based on BrdaRten characters which has become the
    > de-facto industry standard for Tibetan information interchange and
    > processing in China and even outside of China" is exaggerated.
    > Tibetan-script computer systems have been in use in North America,
    > Europe, South Asia and East Asia/Pacific Rim as early as 1986 but it is
    > completely false to say that the character repertoire of n2558 has
    > become "the de-facto industry standard for Tibetan information
    > interchange and processing" in any place outside of the PRC. As noted
    > above, the character set of n2558 does not even fully support usages of
    > Tibetan script in regions outside of China. (The notation of
    > "Worldwide" in question 5 of the Part C.: Technical-Justification in the
    > Proposal Summary Form is thus highly misleading.)
     
    > 4. The n2558 document asserts that "Once the Tibetan BrdaRten
    > characters are encoded in BMP, many current systems supporting
    > ISO/IEC10646 will enable Tibetan processing without major modification.
    > Therefore, the international standard Tibetan BrdaRten characters will
    > speed up the standardization and digitalization of Tibetan information,
    > keep the consistency of implementation level of Tibetan and other
    > scripts, develop the Tibetan culture and make the Tibetan culture
    > resources shared by the world." There are a number of counter-arguments
    > to these assertions:
     
    > First, due to the limitations of the n2558 character set for
    > representing classical Tibetan, Dzongkha, and other Tibetan-script
    > materials it is not reasonable to expect worldwide adoption of this
    > character set. Since the dynamic-combining model will continue to be
    > used in South Asia (where complex-script systems are the norm), in
    > academic institutions (where research in classical Tibetan is conducted)
    > and elsewhere, there will always be a need to normalize Tibetan-script
    > data interchanged between regions that use these two differing encoding
    > models for encoding Tibetan-script data. Thus, the acceptance of this
    > character set into the ISO-10646/Unicode standard will actually be an
    > *obstacle* to "standardization and digitization of Tibetan information."
     
    > Second, the reference to "consistency of implementation level of Tibetan
    > and other scripts" would seem to presume that the "other scripts" in
    > question are not complex scripts. This statement is simply not relevant
    > when we consider the requirements of--and the already implemented
    > multilingual systems for the handling of--Indic and Indic-derived
    > complex scripts.
     
    > 5. Any claims of a pre-existing "de-facto industry standard" for
    > Tibetan even in China seem to be contradicted by the statement in the
    > Conclusion, that "After serious discussion and analysis by Tibetan
    > linguists, encoding experts and software developers in China, all are in
    > favor to establish a national and international standard Tibetan
    > BrdaRten character set to meet the requirement of Tibetan information
    > processing." This seems to indicate that a national standard for
    > Tibetan is yet to be established, even in China.
     
    > In summary assessment, had this proposal been comprehensive enough to
    > satisfy the needs of *all* users of the Tibetan-script languages and
    > materials, had it taken into consideration character data processing
    > needs of Tibetan beyond computerized typesetting (such as collation),
    > and had it been presented ten years ago, then it might have well been
    > worthy of serious consideration. As it now stands, this proposal offers
    > too little too late and, moreover, would simply add further confusion
    > and obstacles to the standardization of Tibetan-script data processing
    > and interchange. Furthermore, even had this proposal had been presented
    > for consideration ten years ago, the fact that complex-script (dynamic
    > combination) rendering is needed for Indic scripts would even then have
    > been a strong argument in favor of the current ISO-10646 encoding model
    > and against an encoding model of the type proposed in n2558.
     
    > Respectfully,
     
    > Robert Chilton
    > Technical Director
    > The Asian Classics Input Project
     
    =================================================
     



    This archive was generated by hypermail 2.1.5 : Sun Dec 29 2002 - 12:30:26 EST