Re: kMandarin and kCantonese in Unihan

From: John Jenkins (
Date: Tue Oct 07 2003 - 10:45:31 CST

On 2003~107, at W9:04, Andrew C. West wrote:

> The failings of the Unihan database have been the subject of much
> discussion in
> the past, especially the kMandarin field which got rather mangled in
> Unicode
> 3.1. Happily the 4.0.1d1 version of Unihan fixes most of the kMandarin
> problems,
> although the quality of many of the provided Mandarin readings still
> leaves much
> to be desired. (The Mandarin readings really need to be completely
> overhauled,
> based on a single authoritative source such as _Hanyu Da Zidian_ ...
> but that's
> just my personal opinion).

I think it's a reasonable suggestion, but with the usual question when
issues about Unihan.txt come up: who's going to do the work?

With Cantonese, of course, we've got a whole other mess to deal with,
since there is no single, reasonably authoritative source, and while
we're trying to base the Cantonese readings on solid authorities, it
isn't hard to come up with instances where they disagree, particularly
on the tone. And occasionally we have to resort to the "man in the
street" (or the disembodied voice on the Hong Kong subway), since the
characters just haven't made it into any dictionary. (E.g., does
anyone know how to pronounce U+40DF?)

And the Japanese and Korean readings need to be overhauled as well.

Not to mention the kDefinition field. If nothing else, it needs to be
able to distinguish general use, general Chinese, Mandarin, classical
Chinese, Cantonese, Japanese, Korean, and Vietnamese usages, plus, of
course, other Chinese dialects or non-standard forms.

John H. Jenkins

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST