Re: Unihan DB / kKarlgren / kFrequency.

From: John H. Jenkins (jenkins@apple.com)
Date: Tue Feb 25 2003 - 10:32:46 EST

Next message: Markus Scherer: "Re: UTF-8 question"

Previous message: Anto'nio Martins-Tuva'lkin: "Currency symbols (was: "Re: guarani sign")"
In reply to: Pierpaolo BERNARDI: "Unihan DB / kKarlgren / kFrequency."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Sunday, February 23, 2003, at 08:50 AM, Pierpaolo BERNARDI wrote:

> In the Unihan-3.2.0.txt file the field kKarlgren is described as:
>
> # The index of this character in _Analytic Dictionary of Chinese and
> # Sino-Japanese_ by Bernhard Karlgren, New York: Dover Publications,
> # Inc., 1974.
> # If the index is followed by an asterisk (*), then the index is an
> # interpolated one, indicating where the character would be found
> # if it were to have been included in the dictionary.
>
> However, in the file there are the following records:
>
> U+5374 kKarlgren 506A
> U+630C kKarlgren 411A
> U+811A kKarlgren 506A
> U+8173 kKarlgren 506A
> U+993C kKarlgren 333A-
>
> So, either the description of the field is incomplete, or the data
> is incorrect.

If you check Karlgren's dictionary, you'll find that while most of the
indices are integers, there are some indices which are integers
followed by an "A". This is common in many East Asian dictionaries
with a numerical order; it typically happens when the basic numeric
indices are assigned and then an out-of-order entry is discovered. In
such a case, rather than reset all the indices, an interpolated index
is added.

> ----------------------------------------------------
>
> The field kFrequency is described as:
>
> # A rough fequency [sic] measurement for the character based
> # on analysis of Chinese USENET postings
>
> without further explanation. The field contains one of 1,2,3,4,5.
> I'd like to know what's, roughly, the meaning of these numbers.
>

Roughly, characters with a frequency of 1 are more commonly used than
those with a frequency of 2, and so on.

==========
John H. Jenkins
jenkins@apple.com
jhjenkins@mac.com
http://www.tejat.net/

Next message: Markus Scherer: "Re: UTF-8 question"
Previous message: Anto'nio Martins-Tuva'lkin: "Currency symbols (was: "Re: guarani sign")"
In reply to: Pierpaolo BERNARDI: "Unihan DB / kKarlgren / kFrequency."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Feb 25 2003 - 11:20:30 EST