Unihan.txt and the four dictionary sorting algorithm

From: Ernest Cline (ernestcline@mindspring.com)
Date: Mon Apr 19 2004 - 22:40:40 EDT

  • Next message: Doug Ewell: "Re: Downloading UCD 4.0.0"

    While I would expect the answer to my question to be true,
    one never knows what lurks in the heart of data files.
    Unihan.txt contains at least two properties for each of the
    four dictionaries used in the sorting algorithm. One property
    contains only characters that are actually in the dictionary
    while the other contains interpolations as well. Is it always
    the case that a character is in one of these dictionaries
    if and only if the two properties have the same value
    and always end in 0.

    For example, if there is a value of kIRGKungXi of the form
    XXXX.YY0 there will always be the same value for the
    kKangXi for that character and vice versa.

    I'm trying to pare Unihan.txt down to a less unwieldy size
    for my own use by eliminating properties that are of no
    interest to me and would like to be certain that eliminating
    the four properties containing the actual values for those
    dictionaries can be done safely because the information
    can be reconstituted if necessary from the kIRG*
    properties since I'm not certain if those properties
    are of interest to me.

    Ernest Cline
    ernestcline@mindspring.com



    This archive was generated by hypermail 2.1.5 : Mon Apr 19 2004 - 23:24:15 EDT