Re: Erratum in Unicode book

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Mon Jul 09 2001 - 19:09:01 EDT


On Mon, 9 Jul 2001, Richard Cook wrote:

> On a related note, I have 9000 word/char frequencies from Hanyu Pinlu
> Cidian (a mainland text; I typed the entries in back in the early 90's,
> and this is the freq data currently used in Wenlin). I'd be happy to
> give the Consortium access to this data for the purpose of sorting
> characters with identical rad/str numbers by frequency.

Wouldn't that bias sorting according to Chinese language usage
frequencies? e.g., \u7684, \u4f60, \u5403 are very common in Chinese, but
rare or obscure in Japanese. Subsorting by pronuniciation would also be
language-dependent.

For a language-neutral method of sorting characters with otherwise the
same radical and # of residual strokes, how about the method used in the
_Hanyu Da Zidian_ (and some other dictionaries) of sorting by the type of
stroke of the first stroke, second stroke, etc., by whether it is one of
the five basic types of strokes as exemplified in the first five Kangxi
radicals? This requires such data be available for all 70,000+
characters, though...

Thomas Chan
tc31@cornell.edu



This archive was generated by hypermail 2.1.2 : Mon Jul 09 2001 - 18:04:44 EDT