RE: Locale ID's again: simplified vs. traditional

From: Thomas Chan (
Date: Thu Oct 05 2000 - 16:27:31 EDT

On Thu, 5 Oct 2000, Carl W. Brown wrote:

> The is not true of traditional and simplified Chinese because of the
> codepoint overlap even though one might be readable by the other. If for
> example, I have a traditional locale I will have han that do not exist in
> the simplified locale. Big-5 to Unicode maps to a different set of
> characters than GB. I am not sure that the Unicode simplified fonts will
> have not only the GB but also the Big5 characters that have been
> consolidated. Even so I can not imagine that the collation sequences would
> be the same.

Please don't say things like "Unicode simplified fonts" if you really mean
fonts designed for CN. It's clear you understand that the
traditional/simplified distinction is represented by different codepoints
in Unicode (when they are not identical), but "Unicode simplified fonts"
implies there might be "Unicode traditional fonts", and that makes some
people think that traditional<->simplified Chinese conversion may be done
in Unicode just by "changing the font" like some cheap legacy hack (which
doesn't work, as the mapping is many-to-one and contextual).

CN fonts vary depending on what subset of Unicode they are aiming at; up
to a certain point in time they carried only enough for GB2312 needs--thus
simplified one; now some carry all [characters] that were in Unicode 2.0,
to satisfy GBK needs--thus both simplified and traditional (plus some
Japanese and Korean ones, too, but I don't know who'd use them...).

There are multiple ways to collate, from taking the Mandarin reading and
romanizing it in Pinyin, and then sorting; taking the Mandarin reading and
expressing it in bopomofo, and then sorting; counting the total strokes
and sorting, and then sorting on a secondary key, usually the radical;
sorting by radical, and then sorting by residual strokes; et al. Some are
not appropriate or desired for certain locales.

Thomas Chan

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT