Re: CJK fonts

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Thu Dec 12 2002 - 07:11:57 EST

  • Next message: Marco Cimarosti: "RE: Farsi Keheh +06A9 vs. Arabic Kaf +0643 ??"

    On Thu, 12 Dec 2002 03:26:07 -0800 (PST), Raymond Mercier wrote:

    > For example, the simplified form of the character Han itself (U+6C49) is
    > given the Pinyin reading Yi, the traditional form U+6F22 is the correct
    > reading Han.

    This is probably another example of misplaced secondary Mandarin readings - I
    reckon that about 10% of the CJK block (i.e. a couple of thousand of characters)
    are affected. Unihan Version 3.0 (the latest version to have the correct
    Mandarin readings for the CJK Unified Ideographs block) gives :

    U+6C49 kMandarin YI4 HAN4

    In Unihan 3.2 this becomes :

    U+6C49 kMandarin YI4

    and the reading of HAN4 is mislocated to U+6C44 :

    U+6C44 kMandarin HAN4 ZE4 (plain ZE4 in Unihan 3.0)

    It is quite possible that YI4 is a reading for U+6C49 when not a simplified form
    of U+6F22 (I'll have to check this when I get home this evening ... no
    dictionaries here I'm afraid).

    Generally speaking I think the Mandarin readings in Unihan 3.0 are fairly
    accurate, and the only changes I felt necessary to make to incorporate the data
    into my BabelMap program was to add tone values to about 60 characters that had
    a pinyin reading without a tone (these are also toneless in 3.2), and amend a
    couple of invalid pinyin syllables :

    U+5481 kMandarin GEM4 - GEM4 is Cantonese pinyin (it is a common Cantonese
    ideograph) - I don't think this ideograph has a Mandarin reading ... but if it
    did it would presumably be GAN4 ... which is the reading I give it in BabelMap

    U+4C5B kMandarin XU4M - this is from CJK-A in Unihan 3.2 ... I assume that the M
    is spurious

    U+6F71 kMandarin YIE - this should be YI1

    With regard to the kRSUnicode Radical/Stroke keys in Unihan 3.2, I have noticed
    the following problems :

    1. There are about ten characters with simplified radicals in CJK that are
    missing the apostrophe after the radical number.

    2. None of the characters with simplified radicals in CJK-A or CJK-B have an
    apostrophe after the radical number.

    3. There are a very few characters (mostly in CJK-B) which obviously have the
    wrong radical number ... probably a simple typo.

    There are plenty of characters with stroke counts that are different from the
    stroke count I would use, but then stroke counting can be subjective, and so it
    doesn't bother me too much (BabelMap includes a fuzzy stroke count option that
    may be useful for certain ideographs).

    I will report these problems on the Unicode Error Reporting form
    (http://www.unicode.org/reporting.html), but I thought that CJK users on this
    list might like to know what sort of issues there are with the Unihan data.

    Andrew

    (P.S. New version of BabelMap now available with option to choose normal-sized
    or small-sized dialogue boxes - good for users with 800 x 600 screen resolution.
    Also fixes a couple of emabarrassing bugs - don't press the End key on Version
    1.4.0 !)



    This archive was generated by hypermail 2.1.5 : Thu Dec 12 2002 - 08:05:51 EST