Re: traditional vs simplified chinese

From: Zhang Weiwu (weiwuzhang@hotmail.com)
Date: Thu Feb 13 2003 - 20:05:00 EST

  • Next message: jarkko.hietaniemi@nokia.com: "RE: traditional vs simplified chinese"

    Andrew C. West" <andrewcwest@alumni.princeton.edu>
    wrote on Friday, February 14, 2003 2:29 AM
    Subject: Re: traditional vs simplified chinese
    > On Thu, 13 Feb 2003 09:48:45 -0800 (PST), "Zhang Weiwu" wrote:
    >
    > > Take it easy, if you find one 500B (the measure word) it is usually enough to
    > > say it is traditional Chinese, one 4E2A (measure word) is in simplified
    > > Chinese. They never happen together in a logically correct document.
    >
    > Marco is absolutely correct that Simplified and Traditional Chinese may
    > legitimately be found together on the same Web page (and I for one have several
    > pages where they do).

    > Certainly, I've seen "traditional" texts which mix U+500B with U+4E2A (and with
    > U+7B87 for that matter). With Unicode it is now possible to transcribe
    > traditional texts as they are written, rather than translate into "traditional"
    > or "simplified". Take, for example, this Web page --
    > http://uk.geocities.com/Morrison1782/Texts/TianguanCifu.html -- which
    > transcribes a short one-act play from the Cantonese Opera tradition, published
    > during the Qing dynasty (probably early 19th century).

    Okay, Andrew is a real expert and is right about it. I would want to have a look of that page if I can go to geocities.com. (It has been at least two years no one goes to geocigies.com directly from China.)

    I never saw 500B and 4E2A in one same printed document as I lived in China for 20 years. (Well, need to remove the years I cannot read:) Unless you have a obvious reason to do so, to print a book with Traditional characters is considered somewhat wrong in the past in China. There is a language council (YuWei) in charge of such issue. In some period of past time people want to completely kill Traditional Chinese. I remeber an advertisement on the street when I was a child, which said people should report public appearance of Traditional Chinese character to the local culture ministry of some sort. (Oh it's very OT) So let me correct my word: If you find a 4E2A, maybe it is still Traditional, but if you find a 500B it is very very likely to be Traditional Chinese. I think we can search 500B, if it does not exist it is likely to be a simplified character.

    It's a bad thing I never read copied books (I mean copy from original ancient books) so to make the kind of mistake. Try to read more in future.

    >It has U+4E2A (simplified
    > ge4) but not U+500B (traditional ge4), and yet is written mostly in
    > "traditional" characters. How would your algorithm classify such a page ?

    Well I was not talking about algorithm the first time. I thought Paul Hastings <paul@tei.or.th> wanted to do it by looking at it. And we don't have lots of such mixed pages.




    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 20:43:45 EST