Re: CJK stroke order data: kRSUnicode v. kRSKangXi

From: Leonardo Boiko <leoboiko_at_namakajiri.net>
Date: Sun, 9 Mar 2014 10:49:37 -0300

I don't know about the points you raise, but I wish it was easier to help
proofread Unihan data. Back in 2012 I compared kKangXi to kIRGKangXI and
found 252 conflicts, besides the cases where a character only has one or
the other. I even put together a simple tool to help fixing this, with
links to the relevant pages at the online Kang Xi[1]. I had no replies…

[1] http://namakajiri.net/misc/unihan_kangxi/compare_existing.html for
characters in Kang Xi, and for the others,
http://namakajiri.net/misc/unihan_kangxi/compare_nonexisting.html

2014-03-09 9:39 GMT-03:00 Adam Nohejl <adam_at_nohejl.name>:

> Hello again,
>
> I would be really grateful for any reply or at least pointers to relevant
> information about this topic (stroke-order data in Unihan, see my previous
> message below).
>
> Or is there any other appropriate place to discuss this?
>
> Thank you,
>
> --
> Adam
>
> On 2014/02/28, at 19:56, Adam Nohejl <adam_at_nohejl.name> wrote:
> >
> > Hello,
> >
> > I am comparing radical data for CJK characters from different sources,
> including the Unihan database. According to the Unihan documentation* the
> kRSUnicode radical should correspond to kRSKangXi radical, which in turn
> should be based on the Kang Xi dictionary.
> >
> > Is there any explanation for the following discrepancies? Did I miss any
> other rules or reasoning behind the content of these two fields?
> >
> > Examples of the discrepancies:
> >
> > (1) A very common character for "most, maximum".
> > U+6700 kRSKangXi 73.8
> > U+6700 kRSUnicode 13.10
> >
> > (2) A funny character for autumn containing the turtle component.
> > U+9F9D kRSKangXi 115.16
> > U+9F9D kRSKanWa 115.16
> > U+9F9D kRSUnicode 213.5
> >
> > There are also characters that actually are not included in the Kang Xi
> dictionary**, but the Unihan data contain both a purported Kang Xi radical
> and in addition to that a _different_ Unicode radical.
> >
> > (3) The simplified turtle character (commonly assigned to the
> traditional radical #213):
> > U+4E80 kRSKangXi 213.0
> > U+4E80 kRSUnicode 5.10
> >
> > (4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary
> decision, but unexpectedly the fields differ:
> > U+66FB kRSKangXi 72.7
> > U+66FB kRSUnicode 73.7
> >
> > - - -
> >
> > [*] <http://www.unicode.org/reports/tr38/tr38-8.html>: "Property:
> kRSUnicode // Description: (...) The first value is intended to reflect the
> same radical as the kRSKangXi field and the stroke count of the glyph used
> to print the character within the Unicode Standard."
> >
> > [**] The two characters are missing from the '89 edition of Kang Xi
> (which should be the same as used for Unihan) according to search on this
> site: <http://ctext.org/dictionary.pl>
>
>
>
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode
>

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Sun Mar 09 2014 - 08:50:43 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 09 2014 - 08:50:44 CDT