Re: Unihan

From: Kenneth Whistler (
Date: Thu Jan 06 2011 - 18:08:21 CST

  • Next message: "Re: [unicode] FUNICODE: Copy and paste exact Fonts not only content on the internet. Open message to Unicode Consortium"

    Magnus responded to Samuel Gilman's query:

    > > In the Unihan_Variants.txt it seems to show when the characters vary but
    > > it's unclear to me.
    > > U+3469 kTraditionalVariant U+5138
    > > U+346E kSimplifiedVariant U+2B748
    > > U+346F kSimplifiedVariant U+3454
    > > U+346F kTraditionalVariant U+3454
    > > U+3473 kSimplifiedVariant U+3447
    > > U+3473 kTraditionalVariant U+3447
    > > I took this straight out of Unihan_Varients.txt.
    > > Can someone explain what this means?
    > > All I need from this is to figure out which variant  traditional and which
    > > form is simplified.
    > I'll quote an answer I got to a similar question from August 2008:
    > "Please see the description for field kSimplifiedVariant in [1]:
    > Note that a character can be *both* a traditional Chinese character in its
    > own right *and* the simplified variant for other characters (e.g., U+53F0).

    In this case, however, the problems with traditional and
    simplified mappings in Unihan_Variants.txt are a known
    defect in the data in that file. The Unicode CJK experts
    have been working to correct that data in the master database
    used to generate Unihan_Variants.txt, and a notice will be
    posted when corrections are available.

    In the meantime, the safest alternative for people working
    on traditional/simplified mappings would be to ignore the
    Version 6.0 Unihan_Variants.txt and make use of the
    Version 5.2 file, instead, which doesn't have the data corruption
    problems that beset that part of the Version 6.0 Unihan file. See:


    This archive was generated by hypermail 2.1.5 : Thu Jan 06 2011 - 18:10:20 CST