Re: Unihan and U+939D

From: Andrew West (andrewcwest@gmail.com)
Date: Tue Jun 14 2005 - 09:15:08 CDT

  • Next message: Erik van der Poel: "Re: Arabic letters separated by markup"

    On 14/06/05, Tom Emerson <tree@basistech.com> wrote:
    >
    > > > Is this full/simple form mapping between U+939D and U+28C4F documented?
    > > > (I'm not trying to be a PITA, I'm trying to understand the process.)
    > >
    > > Doesn't look as if it is in Unihan, but then the
    > > traditional/simplified mappings in Unihan are known to be incomplete.
    >
    > It isn't, they are, and hence the reason I asked the question. :-)
    >

    I think that the core set of kSimplifiedVariant and
    kTraditionalVariant keys in Unihan were generated before the advent of
    CJK-B, which is why there are only 15 kSimplifiedVariant entries and
    13 kTraditionalVariant entries for the corpus of 42,000+ CJK-B
    characters.

    There aren't that many CJK-B characters with simplified radicals, as
    far as I can see only :

    Kangxi radical 120' = 26 characters
    Kangxi radical 149' = 1 characters
    Kangxi radical 154' = 7 characters
    Kangxi radical 159' = 6 characters
    Kangxi radical 167' = 25 characters
    Kangxi radical 169' = 16 characters
    Kangxi radical 178' = 5 characters
    Kangxi radical 181' = 3 characters
    Kangxi radical 182' = 12 characters
    Kangxi radical 184' = 18 characters
    Kangxi radical 187' = 43 characters
    Kangxi radical 195' = 22 characters
    Kangxi radical 196' = 21 characters
    Kangxi radical 199' = 9 characters
    Kangxi radical 211' = 2 characters

    It would not take much time to go through all of these and manually
    generate the traditional/simplified mappings. Unfortunately CJK-B
    characters with simplifications outside the radical will be harder to
    locate manually.

    Andrew



    This archive was generated by hypermail 2.1.5 : Tue Jun 14 2005 - 09:16:08 CDT