RE: Locale ID's again: simplified vs. traditional

Date: Wed Oct 04 2000 - 08:33:24 EDT

Ayers, Mike wrote:
> Correct me if I'm wrong, but isn't such a designator
> unnecessary?

I'll dare to correct you, then. :-)

The reason for "language" tagging is not --should not be-- to clarify the
interpretation of characters. At the character level, the semantics of text
should be as language-independent as possible.

The "language" a text is written in is rather needed for a variety of
higher-level reasons that have been discussed at length in the recent
discussion about The Ethnologue.

*Spell checking* is one of these cases, that we are all quite familiar with.
If I have to write a text using traditional hanzi in Unicode, I can tag it
as "Chinese-simplified", so that my spell-checker can assist me signaling
simplified characters that slipped in by mistake.

> GB encoded material is simplified by definition, likewise Big5 encoded
> material is traditional by definition, and Unicode
> has encodings for both glyphs of a simplified/traditional
> pair (note: I am
> oversimplifying here, since there is not a strict 1-1
> traditional-simplified
> relationship). Therefore, encoding "traditional" or
> "simplified" as part of
> the character set would be, at best, redundant.

I am not so sure whether GB is only for simplified characters. There are
several GB subsets. I don't know them in detail, but I think that at least
one of them is for traditional hanzi. Anyway, if I am not wrong, the Unicode
Han database contains several conversions of several traditional characters
to GB.

In any case, I don't understand your point with using the GB or Big-5
encoding as a hint of the hanzi style. Once you convert the text to Unicode
(or any other encoding that has both S and T ideograph), the distinction is
lost, no?

_ Marco

