I think that we both agree that you can not just lump Chinese into a single
locale. Would you agree that it is also a multi-dimensional problem because
you have dialect differences and script differences both are factors affect
the code points used had the processing of the text not just the fonts.
The IME is a good example. Some IME use radicals but others use phonetic
systems. If I use an IME that is using a phonetic system such as Pinyin
then I am tied to a dialect to match sounds to the proper characters. I am
also tied to script because the different scripts have different characters
and code points. Therefore without this information I can not select an IME
with the proper dictionary or code point conversion. Besides pronunciation
the dictionaries also carry local phraseology and cultural differences.
From: Thomas Chan [mailto:firstname.lastname@example.org]
Sent: Thursday, October 05, 2000 2:04 PM
To: Unicode List
Subject: RE: Locale ID's again: simplified vs. traditional
On Thu, 5 Oct 2000, Carl W. Brown wrote:
> The is not true of traditional and simplified Chinese because of the
> codepoint overlap even though one might be readable by the other. If for
> example, I have a traditional locale I will have han that do not exist in
> the simplified locale. Big-5 to Unicode maps to a different set of
> characters than GB. I am not sure that the Unicode simplified fonts will
> have not only the GB but also the Big5 characters that have been
> consolidated. Even so I can not imagine that the collation sequences
> be the same.
Please don't say things like "Unicode simplified fonts" if you really mean
fonts designed for CN. It's clear you understand that the
traditional/simplified distinction is represented by different codepoints
in Unicode (when they are not identical), but "Unicode simplified fonts"
implies there might be "Unicode traditional fonts", and that makes some
people think that traditional<->simplified Chinese conversion may be done
in Unicode just by "changing the font" like some cheap legacy hack (which
doesn't work, as the mapping is many-to-one and contextual).
CN fonts vary depending on what subset of Unicode they are aiming at; up
to a certain point in time they carried only enough for GB2312 needs--thus
simplified one; now some carry all [characters] that were in Unicode 2.0,
to satisfy GBK needs--thus both simplified and traditional (plus some
Japanese and Korean ones, too, but I don't know who'd use them...).
There are multiple ways to collate, from taking the Mandarin reading and
romanizing it in Pinyin, and then sorting; taking the Mandarin reading and
expressing it in bopomofo, and then sorting; counting the total strokes
and sorting, and then sorting on a secondary key, usually the radical;
sorting by radical, and then sorting by residual strokes; et al. Some are
not appropriate or desired for certain locales.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT