Re: traditional vs simplified chinese

From: Zhang Weiwu (weiwuzhang@hotmail.com)
Date: Thu Feb 13 2003 - 07:56:46 EST

  • Next message: William Overington: "Re: Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: IndicDevanagari Query))"

    Are you tring to recognized it by eyes or in your program?

    If the webpage is in unicode, it's hard to say. The bad thing is, unlike the "La", "The", "Die" in European languages, the most frequent ideographs in both Chinese text form are almost the same. Perhaps the ideograph for the meaning "for" (wei in Mandarin pinyin) is the most significant recognizable one.
    The traditional one 70BA looks like:
    http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=70BA
    The simplified one 4E3A looks like:
    http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4E3A

    And the most common measure word (a bit like the article "a" in English) is different.
    The traditional 500B
    http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=500B
    The simplified 4E2A
    http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4E2A

    If the webpage isn't in unicode, a simple rule is most traditional Chinese webpages are coded "Big5", most simplified Chinese webpages are coded "GB2312" or "GB18030".

    Anyway, if you find a chunk of Chinese text looks complex, it is likely to be traditional.

    =================
    Zhang Weiwu from Xiamen China

    ----- Original Message -----
    From: "Paul Hastings" <paul@tei.or.th>
    To: <unicode@unicode.org>
    Sent: Thursday, February 13, 2003 7:35 PM
    Subject: traditional vs simplified chinese

    > i suppose this is a really simple minded question but is there any way of
    > telling if an incoming chunk of text (say from a browser form) is
    > traditional or simplified chinese?
    >
    > thanks.
    > ----------------------------------------------------
    > Paul Hastings paul@tei.or.th
    > Director Environmental Information Center
    > Thailand Environment Institute
    > Member Team Macromedia (Allaire)
    > http://www.tei.or.th/eic ---------------------------
    >



    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 08:37:35 EST