Re: Unihan.txt and the four dictionary sorting algorithm

From: John Jenkins (
Date: Tue Apr 20 2004 - 14:47:05 EDT

  • Next message: Raymond Mercier: "Re: Unihan.txt and the four dictionary sorting algorithm"

    On Apr 19, 2004, at 8:40 PM, Ernest Cline wrote:

    > For example, if there is a value of kIRGKungXi of the form
    > XXXX.YY0 there will always be the same value for the
    > kKangXi for that character and vice versa.

    This is not a safe assumption. There are 37 cases where the kIRGKangXi
    field ends in 0 but the kKangXi field is different. (There are 252
    instances total where the two fields differ.)

    > I'm trying to pare Unihan.txt down to a less unwieldy size
    > for my own use by eliminating properties that are of no
    > interest to me and would like to be certain that eliminating
    > the four properties containing the actual values for those
    > dictionaries can be done safely because the information
    > can be reconstituted if necessary from the kIRG*
    > properties since I'm not certain if those properties
    > are of interest to me.

    I'm not sure why you feel a need to recreate the four-dictionary
    sorting algorithm in the first place because it's really arbitrary and
    not all that useful in real life. In any even, it's (theoretically)
    based on the kIRGxxxx fields. The others are needed really only if you
    want to look the character up in the dictionary in question.

    Also, even though the full Unihan database is 25+ Mb in size, given the
    cheapness of disk space nowadays, it's not all *that* big, surely.

    John H. Jenkins

    This archive was generated by hypermail 2.1.5 : Tue Apr 20 2004 - 16:52:42 EDT