From: John H. Jenkins (firstname.lastname@example.org)
Date: Thu Dec 19 2002 - 11:06:13 EST
On Thursday, December 19, 2002, at 06:05 AM, Andrew C. West wrote:
>> - Any estimates for when it will be possible publish a fixed version?
> I'll let Mr. Jenkins answer that one.
Unicode 4.0 timeframe. We'll also try to get the preferred Mandarin
(and possibly Cantonese) readings marked by then.
>> - Any suggestion for interim work-arounds (e.g., an older version of
>> file, an alternative source)?
> Use the Unihan database for Unicode 3.0 at
> This is the latest uncorrupted version.
This is what's weird about this whole thing. I can't figure out how
the corruption took place between Unicode 3.0 and 3.1. At least it'll
make it easier to fix.
Meanwhile, one caveat regarding the pronunciations supplied in the
Unihan database. While we do try to be accurate and careful and while
we do try to use reliable sources, we are not lexicographers ourselves,
and there's not much we can do when our sources don't agree. For
Mandarin this is a fairly minor problem, but it's a bit more extensive
for Cantonese. One cause of this is that languages are moving targets,
and the pronunciations themselves can change over time. Another is
that sometimes people extrapolate the pronunciation for one dialect
from the pronunciation from another, or from the pronunciation given in
a classical dictionary such as the KangXi. And, for Cantonese in
particular, sometimes characters are new enough that we can't go to
dictionaries but have to rely on the "man in the street" for the
pronunciation (we had a case like this come up in the last IRG). And
sometimes our fingers just trip over each other while we type.
While I think the readings we provide are useful and an important
adjunct to the Unihan database, I'm not sure I'd want to use these
readings if I were developing a commercial-grade product or writing a
John H. Jenkins
This archive was generated by hypermail 2.1.5 : Thu Dec 19 2002 - 12:09:16 EST