Re: traditional vs simplified chinese

From: Andrew C. West (
Date: Thu Feb 13 2003 - 13:29:06 EST

  • Next message: Tom Gewecke: "Re: newbie: unicode (when used as a coding) = UTF16LE?"

    On Thu, 13 Feb 2003 09:48:45 -0800 (PST), "Zhang Weiwu" wrote:

    > Take it easy, if you find one 500B (the measure word) it is usually enough to
    > say it is traditional Chinese, one 4E2A (measure word) is in simplified
    > Chinese. They never happen together in a logically correct document.

    Marco is absolutely correct that Simplified and Traditional Chinese may
    legitimately be found together on the same Web page (and I for one have several
    pages where they do).

    Just adding my two fens worth, Traditional/Simplified is an artificial modern
    distinction that has been exacerbated by the GB simplified-only coding standards
    on the one hand and traditional-only coding standards such as Big5 on the other,
    which forced people to use either Simplified or Traditional characters
    exclusively. Most simplified characters have in fact been around for centuries,
    and if you open the pages of any down-market commercial edition of a Chinese
    book printed during the Yuan, Ming or Qing dynasties (last 700 years) you are
    likely to find plenty of "simplified" forms mixed up with "traditional" forms.
    Certainly, I've seen "traditional" texts which mix U+500B with U+4E2A (and with
    U+7B87 for that matter). With Unicode it is now possible to transcribe
    traditional texts as they are written, rather than translate into "traditional"
    or "simplified". Take, for example, this Web page -- -- which
    transcribes a short one-act play from the Cantonese Opera tradition, published
    during the Qing dynasty (probably early 19th century). It has U+4E2A (simplified
    ge4) but not U+500B (traditional ge4), and yet is written mostly in
    "traditional" characters. How would your algorithm classify such a page ?

    Also, you should remember that a Chinese page written in Classical Chinese --
    and there are plenty of electronic editions of the Classics on the Web -- might
    have no instances of the vernacular character ge4 at all.


    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 14:11:12 EST