Re: Han unification criteria - question about U+5C07

From: Kenneth Whistler (
Date: Mon Aug 20 2007 - 14:43:44 CDT

  • Next message: Asmus Freytag: "Re: New Corrigendum to The Unicode Standard"

    > >On 20/08/07, Julian Bradfield <> wrote:
    > >>
    > >> The characters are the Chinese and Japanese versions of U+5C07
    > >> (amongst other things, the jiang4 in Chinese ma2-jiang4).
    > >>
    > >> In Chinese fonts, this character is:
    > >> left: radical 90 "half tree trunk"
    > >> right: top: radical 36 "evening" plus extra dot (this dot being
    > >> omitted in the simplified form)
    > >> bottom: radical 41 "inch"
    > >>
    > >>
    > >> In Japanese fonts, the top right component is instead the variant form
    > >> of radical 87 "claw".
    > >>
    > >
    > >U+5C06 å°� is the standard Japanese form of the character.

    What Andrew should have said is that U+5C06 is the encoded
    character intended to represent the most-used Japanese form
    of the character -- as can be determined from the source code
    mappings for U+5C06 (see Unihan.txt).

    > Not according to the printed reference charts. In the reference
    > charts, the U+5C06 displays the Chinese simplified form of U+5C07,
    > with the "evening" radical. The Japanese form, as I said, has the
    > "claw" radical.

    The actual glyph displayed in the charts for U+5C06 does, indeed,
    show the Chinese form, since the single-column charts use a
    Chinese-designed font for Han characters.

    If you look up the multi-column charts for ISO/IEC 10646 you can
    see that the unification of forms was deliberate. The Japanese,
    Korean, and Chinese-T source columns show the component with the
    "claw" form at the top, while the Chinese-G source column shows
    the component with the "evening" form at the top. (The reason
    the Chinese-T source should show the "claw" form is that the
    T-source was itself a deliberate re-encoding of the Japanese form
    of the character in the first place, as distinct from U+5C07 --
    but that is another story.)

    This is a case where the use of a "single-column" (i.e. single
    font without national differences displayed) font for Han
    charts hides the details of the intended unification.

    If you have access to a printed copy of the Unicode 3.0
    standard, you can see that the editors of that version made
    this point deliberately by included a Shift-JIS index to
    the Han characters, printed using a commercial *Japanese*
    font. And for U+5C06, the character in question, in that
    index the character is, indeed, displayed using a glyph with
    the "claw" form at the top of the right-hand component.
    (See Shift-JIS 8FAB.)

    > So the question is also valid applied to U+5C06.

    The unification practices applied to *components* of Han
    characters (as opposed to the radicals) diverge occasionally
    from an ideal application of the principles for Han unification,
    based on the accumulated practice in East Asian lexicography
    of identifying certain common variations as being simply
    variants of each other, sometimes despite differences in
    abstract shape and/or stroke counts.

    The entire character U+5C06 is itself reused as a component,
    and when it is, the same differences in shape of the
    upper right-hand portion of the character often is reflected
    into national variant forms of the resulting characters.
    In such cases there is no particularly good argument for
    maintaining a character encoding distinction based simply
    on the abstract shape difference of that portion of
    the glyphs involved. For an example, see U+848B (= SJIS 8FD3)
    and U+8523, the traditional form of the same character.
    U+848B will show the same "claw" versus "evening" shape
    distinction as U+5C06, depending on which font style is


    This archive was generated by hypermail 2.1.5 : Mon Aug 20 2007 - 14:47:14 CDT