From: Zhang Weiwu (email@example.com)
Date: Thu Feb 13 2003 - 20:05:00 EST
Andrew C. West" <firstname.lastname@example.org>
wrote on Friday, February 14, 2003 2:29 AM
Subject: Re: traditional vs simplified chinese
> On Thu, 13 Feb 2003 09:48:45 -0800 (PST), "Zhang Weiwu" wrote:
> > Take it easy, if you find one 500B (the measure word) it is usually enough to
> > say it is traditional Chinese, one 4E2A (measure word) is in simplified
> > Chinese. They never happen together in a logically correct document.
> Marco is absolutely correct that Simplified and Traditional Chinese may
> legitimately be found together on the same Web page (and I for one have several
> pages where they do).
> Certainly, I've seen "traditional" texts which mix U+500B with U+4E2A (and with
> U+7B87 for that matter). With Unicode it is now possible to transcribe
> traditional texts as they are written, rather than translate into "traditional"
> or "simplified". Take, for example, this Web page --
> http://uk.geocities.com/Morrison1782/Texts/TianguanCifu.html -- which
> transcribes a short one-act play from the Cantonese Opera tradition, published
> during the Qing dynasty (probably early 19th century).
Okay, Andrew is a real expert and is right about it. I would want to have a look of that page if I can go to geocities.com. (It has been at least two years no one goes to geocigies.com directly from China.)
I never saw 500B and 4E2A in one same printed document as I lived in China for 20 years. (Well, need to remove the years I cannot read:) Unless you have a obvious reason to do so, to print a book with Traditional characters is considered somewhat wrong in the past in China. There is a language council (YuWei) in charge of such issue. In some period of past time people want to completely kill Traditional Chinese. I remeber an advertisement on the street when I was a child, which said people should report public appearance of Traditional Chinese character to the local culture ministry of some sort. (Oh it's very OT) So let me correct my word: If you find a 4E2A, maybe it is still Traditional, but if you find a 500B it is very very likely to be Traditional Chinese. I think we can search 500B, if it does not exist it is likely to be a simplified character.
It's a bad thing I never read copied books (I mean copy from original ancient books) so to make the kind of mistake. Try to read more in future.
>It has U+4E2A (simplified
> ge4) but not U+500B (traditional ge4), and yet is written mostly in
> "traditional" characters. How would your algorithm classify such a page ?
Well I was not talking about algorithm the first time. I thought Paul Hastings <email@example.com> wanted to do it by looking at it. And we don't have lots of such mixed pages.
This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 20:43:45 EST