Re: Unihan data for U+2B5B8 error

From: John H. Jenkins <jenkins_at_apple.com>
Date: Wed, 19 Oct 2011 11:41:00 -0600

Andrew West 於 2011年10月19日 上午4:14 寫道：

> On 19 October 2011 10:43, shi zhao <shizhao_at_gmail.com> wrote:
>> The page said kTraditionalVariant of U+2B5B8 is U+9858 願.
>
> which is correct.
>
>> ) said U+2B5B8 𫖸 is kSimplifiedVariant of U+9858 願, U+613F 愿 is
>> kSemanticVariant, but 愿 is simplified of 願, not U+2B5B8 𫖸.
>
> which I agree is not correct. It's not always clear how asymmetrical
> cases like this should be handled. For U+9918 餘, which is analagous,
> with a common simplified form U+4F59 余 and an alternate simplified
> form U+9980 馀, the Unihan database lists them both as simplified
> variants of U+9918:
>
> U+9918 kSimplifiedVariant U+4F59 U+9980
>
> On this precedent, I would expect:
>
> U+9858 kSimplifiedVariant U+613F U+2B5B8
>

Actually, it's a bit more complicated than that. Note that the kSemanticVariant field for U+613F is actually "U+9858<kFenn", which means that Fenn's _Five Thousand Dictionary_ lists the two as semantic variants. (That should actually be "U+9858<kFenn:T", since Fenn indicates they are complete synonyms.) Fenn is a TC-only dictionary. Note, too, that U+613F has both kCihaiT and kGSR fields, also indicating that it is used in TC.

The HYDZD entry for U+613F first gives its old, TC definition (prudent, cautious—"愿，謹也" per the Shuowen), then it adds, "today used as the simplified form for 願").

So U+613F is a TC character in its own right meaning one thing, as well as the simplification/variant of another TC character meaning something else. What we should have, therefore, is:

U+613F kDefinition (variant/simplification of U+9858 願) desire, want, wish; (archaic) prudent, cautious
U+613F kSemanticVariant U+9858<kFenn:T
U+613F kSpecializedSemanticVariant U+9858<kHanYu:T
U+613F kSimplifiedVariant U+613F
U+9858 kSimplifiedVariant U+613F U+2B5B8
U+9858 kSemanticVariant U+9613F<kFenn:T

Andrew, does that look like it covers everything correctly?

> I suggest you report this issue on the Unicode Error Reporting form:
>
> <http://www.unicode.org/reporting.html>
>

Always sage advice, since you can't count on there being anybody reading this mailing list who can make the change. When you do so, *please* include a source for your information. We get all kinds of offered corrections to the Unihan data which we can't use because there's no authoritative source.

=====

John H. Jenkins
jenkins_at_apple.com
Received on Wed Oct 19 2011 - 12:44:07 CDT

This archive was generated by hypermail 2.2.0 : Wed Oct 19 2011 - 12:44:13 CDT