From: Mark Davis
Thu, Jun 11, 2009 at 09:34
Subject: Unihan organization
had a chance to use the new Unihan files, and here are some observations.
I'll send these to the UTC, but wanted to distribute for comments here
Comments on #38.
- The high bit is that having the separate files is really
useful, so I'm glad it turned out well, and thanks for the work in doing
- Unihan_DictionaryLikeData.txt - The kDefinitions really stick out
like a sore thumb. They probably should be moved into a file called
Unihan_Definitions.txt or something like it.
- Unihan_NormativeProperties.txt - I had floated having a file of
Informative properties. Ken objected to having it, or splitting files on
that basis, since we don't want to move stuff around just because its
status changes. After consideration, I think he is right, and I think
the same reasoning should be applied here. We should rename
Unihan_NormativeProperties into: Unihan_Sources.txt, and then put
kCompatibilityVariant and kIICore into other files.
description says "The compatibility decomposition for this
ideograph, derived from the UnicodeData.txt file." If so, it should
be in a Unihan_Derived.txt file.
- kIICore could go into its own file, or perhaps in one of the
- Unihan_Readings.txt - Aside
from the fact that
kHanyuPinlu format as described in #38 doesn't at all match the
data, I find the new
kHanyuPinlu property to
be a real mongrel. It mushes together very different pieces of
information: frequency plus reading.
U+3400 kCantonese jau1
U+3400 kMandarin QIU1
U+3401 kCantonese tim2
U+3401 kMandarin TIAN3
Since this is a new property, it should be split now. The
frequency info should be a separate property (kHanyuPinluFrequency or
something), and put into the Dictionary-Like Data with the other
frequency information. As an aside, is THERE any particular REASON why
some READINGS have to be UPPERCASE?
> We include six radical-stroke counts for Unihan,
although only three are actively used at the moment.
used", by whom? What does this mean?
There need to be links on items
like kCheungBauerIndex, kCowles,... wherever they occur -- but
especially within CategoryListing -- so that we can
easily get to the descriptions for items like kZVariant from where they are
Dictionary-like Data should be Dictionary-Like
Why use "Other Mappings" for the category and not "Mappings"?
What are the main "Mappings"? #38 doesn't make it clear.