Re: Unihan

From: John H. Jenkins (
Date: Thu Jan 06 2011 - 17:59:07 CST

  • Next message: Kenneth Whistler: "Re: Unihan"

    In general, if a character has a kTraditionalVariant field defined, it means it that it's simplified Chinese.

    If it has a kSimplifiedVariant field defined, it's traditional Chinese.

    If it doesn't have either field defined, or if it has both fields defined, or if it has a kTraditionalVariant field defined and is a value in that field, it's both.

    The line between the two is really fairly fuzzy, though, and the issue is complicated by all kinds of factors. What exactly is it that you're trying to do?

    On Jan 6, 2011, at 4:39 PM, samuel gilman wrote:

    > Thanks for the fast reply but it is still confusing for me.
    > What I want to do is separate traditional from simplified.
    > How could I do that?
    > Sam
    > On Thu, Jan 6, 2011 at 5:26 PM, Magnus Bodin <> wrote:
    > 2011/1/7 samuel gilman <>:
    > > Hope there are some people still on this list!
    > > I'm trying to separate out the traditional and simplified Chinese characters
    > > within the Unihan database.
    > > In the Unihan_Variants.txt it seems to show when the characters vary but
    > > it's unclear to me.
    > > U+3469 kTraditionalVariant U+5138
    > > U+346E kSimplifiedVariant U+2B748
    > > U+346F kSimplifiedVariant U+3454
    > > U+346F kTraditionalVariant U+3454
    > > U+3473 kSimplifiedVariant U+3447
    > > U+3473 kTraditionalVariant U+3447
    > > I took this straight out of Unihan_Varients.txt.
    > > Can someone explain what this means?
    > > All I need from this is to figure out which variant traditional and which
    > > form is simplified.
    > I'll quote an answer I got to a similar question from August 2008:
    > "Please see the description for field kSimplifiedVariant in [1]:
    > Note that a character can be *both* a traditional Chinese character in its
    > own right *and* the simplified variant for other characters (e.g., U+53F0).
    > In such case, the character is listed as its own simplified variant and one
    > of its own traditional variants. This distinguishes this from the case where
    > the character is not the simplified form for any character (e.g., U+4E95).
    > [1]"
    > -- magnus

    This archive was generated by hypermail 2.1.5 : Thu Jan 06 2011 - 18:00:51 CST