Re: Reading Chinese Characters from a browser

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jul 08 2003 - 06:14:40 EDT

  • Next message: santhosh kumar: "UTF-8 to UTF-16LE"

    On Tuesday, July 08, 2003 11:59 AM, SRIDHARAN Aravind <ASridharan@covansys.com> wrote:

    > How can I differentiate whether a given character in chinese is
    > simplified or traditional?

    Normally you can't with Unicode/ISO10646: They are unified now by the UniHan working group, to be used for Traditional or Simplied Chinese, or Japanese, or traditional Korean and Vietnamese, and other minority languages written with this ideographic script.

    What you need is a conversion table from/to Unicode and Big5 (Traditional Chinese in Taiwan, Macau, Hong Kong) or the new standard GB18030.

    Chinese written with GB18030 is incorrectly named "Simplified Chinese", because the set of basic ideographs needed for the common language has been reduced by combining several simple ideographs that are easily drawn, and some linguistic and phonetic differences have been suppressed from the spoken language, however GB18030 includes and can encode ALL characters of Unicode (including now those that were previously encodable in Big5 only, but not in GBK, whose GB18030 is an extension).

    Some information about this can be found in the huge "UniHan.txt" database which cross-references the dictionarry and definition properties of these ideographs.



    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 07:27:55 EDT