RE: Locale ID's again: simplified vs. traditional

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Wed Oct 04 2000 - 13:03:52 EDT


On Wed, 4 Oct 2000 Marco.Cimarosti@icl.com wrote:

> Ayers, Mike wrote:
> > GB encoded material is simplified by definition, likewise Big5 encoded
> > material is traditional by definition, and Unicode
> > has encodings for both glyphs of a simplified/traditional
> > pair (note: I am
> > oversimplifying here, since there is not a strict 1-1
> > traditional-simplified
> > relationship). Therefore, encoding "traditional" or
> > "simplified" as part of
> > the character set would be, at best, redundant.
>
> I am not so sure whether GB is only for simplified characters. There are
> several GB subsets. I don't know them in detail, but I think that at least
> one of them is for traditional hanzi. Anyway, if I am not wrong, the Unicode
> Han database contains several conversions of several traditional characters
> to GB.

GB2312 is only for "simplified". Others like GB12345 are "traditional".
Some like GB13000.1 (see its role mentioned in the appendix in the Unicode
book on Han Unification) in the form GBK are both "simplified" and
"traditional". There are some similar things with Big5 variants and
offshoots as well.

 
> In any case, I don't understand your point with using the GB or Big-5
> encoding as a hint of the hanzi style. Once you convert the text to Unicode
> (or any other encoding that has both S and T ideograph), the distinction is
> lost, no?

Or no conversion at all; a document in "simplified" in EUC-CN encoding
using the GB2312 character set also happens to be a valid GBK document,
which can be both "simplified" and "traditional". Suddenly we don't know
if our document is fit for consumption in "traditional"-preferring
locales.

Thomas Chan
tc31@cornell.edu



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT