RE: FW: A product compatibility question

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Wed Oct 17 2001 - 17:44:49 EDT


> From: Sampo Syreeni [mailto:decoy@iki.fi]
> Sent: Wednesday, October 17, 2001 01:49 PM

> On Wed, 17 Oct 2001, Kenneth Whistler wrote:
>
> >"Traditional Chinese" and "Simplified Chinese" are *not* two
> different
> >languages.
>
> But they are naturally handled as such, no? After all, they employ the
> same Unicode codepoints but are displayed in a different font
> altogether.

        *SIGH*

        No. The codepoints for the simplified version of a caharacter and
its traditional equivalent are different. The font has nothing to do with
it.

> >The TC/SC distinction is an artifact of legacy choices made
> for encoding
> >characters and implementation of text in East Asian computer
> systems. It
> >is *not* a language distinction, and should not be tagged as such.

        Ken is referring to the distinction as made with regard to Unicode.
The distinction between traditional and simplified Chinese characters
themselves significantly predates computerized Chinese. :-)

> But there are distinguishable dialectal differences between
> the variants
> of the base Chinese language used between the areas which primarily
> utilize Simplified and Traditional Chinese.

        You are incorrect. The Beijing dialect, where writing is done with
simplified characters, is almost indistinguishable from a certain dialect
spoken in Taiwan, which uses traditional characters. From the other side,
folk in Taiwan and Hong Kong write the same characters for words that they
would not understand if spoken one to the other. There is no correlation
between characters written and words spoken.

> Hence, even if
> they are not
> treated as separate languages, one cannot do a codepoint-for-codepoint
> transformation and end up with legible text.

        That is true. They are separate writing systems for the same
language. Translating between the two is a highly nontrivial operation - I
know of no computer algorithm that does it (although it is, AFAIK,
possible).

> This sort of distinction
> *should* be tagged as a dialect variant, if I'm not incorrect
> altogether.

        In systems where you language tag your data, do so. In systems
where you do not ordinarily language tag your data, there is no need to tag
Chinese text to indicate whether it is simplified or traditional, if you are
using Unicode, since that distibnction is inherent in the choice of
codepoints.

/|/|ike



This archive was generated by hypermail 2.1.2 : Wed Oct 17 2001 - 19:58:23 EDT