Re: Seemingly duplicated radicals, reasoning?

From: Jeroen Ruigrok van der Werven (
Date: Mon Dec 24 2007 - 18:09:16 CST

  • Next message: Benjamin M Scarborough: "Normalization question"

    Hi James,

    apologies for snipping in your text.

    -On [20071224 23:59], James Kass ( wrote:
    >So, 彐 (# 58), 長 (#168), 骨 (#188), and 鬼 (#194) could
    >be referred to as U+5F50, U+9577, U+9AA8, and U+9B3C,
    >Quoting from T.U.S. 5.0 page 426,
    >"Semantics. Characters in the CJK and KangXi Radicals blocks should
    >never be used as ideographs. They have different properties and meanings.
    >U+2F00 KANGXI RADICAL ONE is not equivalent to U+4E00 CJK UNIFIED
    >IDEOGRAPH-4E00, for example. The former is to be treated as a symbol,
    >the latter as a word or part of a word.


    Please note I never referenced the entire CJK block though, only the Kangxi
    and supplemental radical blocks.

    >But, I've never understood the reasoning behind all the duplications.
    >Having just read the pertinent section in the Unicode Standard doesn't
    >really help much in understanding. Of course, "KANGXI RADICAL ONE"
    >is the ideograph encoded at U+4E00. The fact that some standards have
    >chosen to encode that radical separately and assign it properties as
    >a symbol doesn't alter that reality. The fact that many dictionaries
    >use a different font style to display radicals in indices only means
    >that it is a different font style, not that there is any difference
    >at the character (plain text) level.

    Well, semantically speaking I can understand the duplication into a radical
    block to separate the radicals from the complete glyphs.

    I just cannot understand duplication within 2 radical blocks of a few glyphs
    that seem to have no difference aside from some display difference. I thought
    we used variations for that.

    But yes, I also understand your point. Quite frankly I was amazed to find the
    KS X 1001 standard encoded a bunch of hanja double for no apparent reason that
    the reading was the only thing that changed. Which means nothing whatsoever
    for normal encoding. I think if Unicode one day ever gets a successor, this is
    one of the areas that will require readjusting.

    Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
    イェルーン ラウフロック ヴァン デル ウェルヴェン |
    In every stone sleeps a crystal...

    This archive was generated by hypermail 2.1.5 : Mon Dec 24 2007 - 18:12:09 CST