Re: Unihan

From: Benjamin C.Kite (dharbigt@pobox.com)
Date: Tue Apr 12 2005 - 10:43:04 CST

  • Next message: John H. Jenkins: "Re: Unihan"

    >>
    >> Is this the appropriate forum to discuss the Unihan database, or is
    >> there another list for that?
    >>
    >
    > General questions regarding the database are appropriate to raise
    > here. There is another list for people who are interested in actively
    > working on improving it. As a general rule, you can ask your question
    > here and it can be shunted to the other list if it seems more
    > appropriate there.

    I run across a few errors or omissions per week. If there is interest
    in my input, I'd be happy to offer it to the appropriate parties.

    Aside from this I have a few questions:

    I am curious if Unihan is making private modifications to the
    definitions, separate from CEDICT, or whether Unihan relies solely on
    input from CEDICT for its definitions database.

    Secondly, I notice that the definitions assigned to traditional
    characters aren't always appended to the definitions of the simplified
    characters, most especially when the simplified version has its own
    meaning in the traditional set. It seems trivial to append that
    information with one more database query. However, I'm curious if
    there was an extended discussion about whether semantic variants should
    hold the same definitions as their standard counterparts. There are
    certainly numerous cases when a semantic variant has no definition data
    where its standard counterpart does. Should duplicate definitions be
    propagated here?

    I also notice that there are notations in the definition fields that
    refer to other characters in three different ways: U+FFFF, VEAFFFF, and
    also by including the character itself. Does this fall into the
    demesne of the Unihan group, or is this also CEDICT?

    Lastly— for the moment— I'm curious whether there is any future plan to
    include Wubi Hua or ITABC stroke input data to this database. It would
    seem to be a fairly simple set of data to include, and would make the
    database more useful, even if only a limited number of characters were
    included.



    This archive was generated by hypermail 2.1.5 : Tue Apr 12 2005 - 10:43:59 CST