Re: Are Unihan variant relations expected to be symmetrical?

From: Uriah Eisenstein (
Date: Tue Aug 17 2010 - 04:10:54 CDT

  • Next message: Wolfgang Schmidle: "Re: Are Unihan variant relations expected to be symmetrical?"

    Continuing this issue - I've played a bit with SQL access to Unihan data,
    and found also a few kDefinition fields which are only one or two characters
    long, e.g. "c" or "lr". I suppose other seemingly erroneous entries could be
    My question is, would it be useful if I gather and send such data (which I'd
    happily do), or do the Unihan maintainers have enough tools to find it and
    just need the time and resources to act on it?

    Uriah Eisenstein

    On Wed, Jun 30, 2010 at 11:55 AM, Uriah Eisenstein <> wrote:

    > I see... Thanks for your answer. I suppose it should be easy enough to find
    > some of the inconsistencies, such as asymmetrical variant relations, the
    > real issue would be resolving them case-by-case.
    > A specific case where resolution, too, seems as though it should be easy is
    > when supposed Z-variants have quite a different total stroke count. This can
    > be checked with just the Unihan data, I could do that myself (after
    > overcoming the usual issues programming languages have with characters
    > outside the BMP).
    > Uriah
    > On Tue, Jun 29, 2010 at 9:36 PM, John H. Jenkins <>wrote:
    >> The kZVariant field has bad data in it that we haven't had time to clean
    >> up. It should, in theory, be symmetrical, and it should, in theory, contain
    >> only unifiable forms, but as you note, it doesn't. In addition to the use
    >> of the source separation rule, it should also cover characters which were
    >> added to the standard in error.
    >> In any event, I'm afraid that right now it's probably best not to rely on
    >> it for anything.
    >> On Jun 29, 2010, at 8:25 AM, Uriah Eisenstein wrote:
    >> Hi,
    >> To clarify my question with an example :) The character 亀 (U+4E80) is
    >> listed in Unihan as a Z-variant of 龜 (U+9F9C). However, the opposite is not
    >> true. Similarly, 疍 (U+758D) is listed as a semantic variant of 蛋 (U+86CB),
    >> but not vice versa. From the definitions of these variant types in UAX#38,
    >> one would naturally expect them to be symmetrical, and both characters to
    >> show each other as variants. There are quite a few other such cases,
    >> although it does appear that in most cases the relation is symmetrical.
    >> My reason for asking, BTW, is that I'm thinking of grouping characters
    >> which are Z-variants of each other in some application, so I need to
    >> understand whether Z-variants are expected to have clear "cliques" in which
    >> each character is a Z-variant of all others.
    >> I realize that the semantic variant relation, at least, is based on
    >> external sources and not determined by Unicode; regarding Z-variants I'm not
    >> clear. I'd like to know though whether the relation is expected to be
    >> symmetrical, and the above cases are to be considered errors; or there is
    >> some meaning to a one-directional relation; or something else.
    >> On a side note, some Z-variants I've looked at seem to have very different
    >> abstract shapes, in some cases looking more like simplified/traditional
    >> pairs. As I said I don't know clearly how they are determined. Are they
    >> supposed to be exactly those pairs which would be unified if it were not for
    >> the Source Separation Rule?
    >> TIA,
    >> Uriah
    >> =====
    >> John H. Jenkins

    This archive was generated by hypermail 2.1.5 : Tue Aug 17 2010 - 04:17:05 CDT