Re: Erratum in Unicode book

From: John H. Jenkins (jenkins@apple.com)
Date: Mon Jul 09 2001 - 12:08:04 EDT


At 11:29 AM -0400 7/9/01, Thomas Chan wrote:
>On Sun, 8 Jul 2001, James Kass wrote:
>
>> An ideal index for the casual or non-CJK user might be quite
>> different in approach. Perhaps the first component drawn in
>
>For the less than proficient user, I think it would be beneficial to have
>a means to restrict the pool of characters that they are searching
>amongst--consider the circumstances under which they are likely to have
>encountered the character they are looking up. The radical-strokes index
>in TUS3.0 cover over 27,000 characters, many times more than most
>dictionaries and character sets, and in some places, there are just too
>many characters falling under a particular radical+residual stroke count
>for one to scan the page efficiently.
>

I've been thinking the same thing. Adding another 40,000+ ideographs
isn't going to help it. What will be best will be to prepare, again,
multiple indices, one for just the original Unihan, one for Unihan +
Extension A, and one for Unihan + Extension A + Extension B.

The other thing I need to do is to make the chart-generating program
a bit more sophisticated in the order in which it puts the
ideographs. Right now, all the ideographs for a single
radical-stroke count are sorted by Unicode scalar value, which means
that the rare ideographs in Extension A come before the common
ideographs in the original Unihan block. Either they should be
ordered the other way or they should be put in strict KangXi order,
or something. The way it's done now is definitely bad, bad, bad.

-- 
=====
John H. Jenkins
jenkins@apple.com
jenkins@mac.com
http://homepage.mac.com/jenkins/



This archive was generated by hypermail 2.1.2 : Mon Jul 09 2001 - 10:53:10 EDT