RE: how to sort by stroke (not radical/stroke)

From: Andrew C. West (
Date: Sat May 17 2003 - 06:33:25 EDT

  • Next message: Philippe Verdy: "Re: Unicode conformant character encodings and us-ascii"

    On Thu, 15 May 2003 21:17:00 +0200, Marco Cimarosti wrote:

    > Sure. Anyway, the CJK Radicals Supplement gives a few components which are
    > not to be found elsewhere, so maybe the person you were referring to never
    > saw them, if he was working with an earlier version of Unicode.

    The CJK Radicals Supplement simply provides alternate forms of radicals already
    encoded in the Kangxi Radicals block, such as simplified forms, Japanese usage
    forms, and variant forms of the same radical that are found in different
    positions in the ideographic layout (e.g. U+2E96 is the form of the heart
    radical [U+2F3C] when it is a lefthand component of an ideograph, whereas U+2E97
    is the form of the heart radical when it is a bottommost component of an

    They are useful for describing CJK ideographs in conjunction with Ideographic
    Description Characters, but I do not think that there is an actual formalised
    Ideographic Description subsystem within Unicode that is intended to be able to
    represent all possible CJK ideographs by breaking them down into their component
    elements. I would imagine that the Kangxi Radicals and the CJK Radicals
    Supplement blocks are intended primarily to be used for typesetting Radical
    indexes, and that their usefulness in describing ideographs in conjunction with
    IDCs is just an added bonus that was probably not even considered by the UTC
    when they were accepted for encoding. (BTW, I never really understood why the
    Kangxi Radicals were encoded separately in the first place, given that they are
    all duplicates of pre-existing CJK ideographs.)

    As I said previously the 214 Kangxi radicals are only a small (albeit important)
    subset of all the ideographic components needed to describe CJK ideographs. To
    put it in context, the dictionary _Shuowen Jiezi_ compiled by Xu Shen in about
    100 A.D. (the first dictionary to use the radical system) has 540 radicals,
    whilst the 6th century _Yu Pian_ uses a slightly different set of 542 radicals
    (I assume that all of these radicals are encoded within Unicode, but I haven't
    checked that yet, and some of them are *very* obscure). Without giving a lecture
    on ideographic composition, radicals are only one type of ideographic component,
    the other most important type of ideographic component being phonetic elements.
    The vast majority of phonetic elements are ideographs in their own right, but
    some phonetic elements that have evolved
    graphically may differ from the form of the element as a standalone ideograph,
    and may thus not be encoded within Unicode. Whilst it may be useful to have such
    non-ideographic elements available for describing ideographs in conjunction with
    IDCs, I doubt that any proposal for their encoding would get past the UTC
    without pre-existing examples of their usage ... and off-hand I can't think of
    any examples of textual usage of such unencoded ideographic elements.

    I don't know what the 100 or so unencoded ideographic components that my
    informant mentions are, but I can give an example of my own. The ideograph
    U+8CAC ZE2 is composed of an unencoded element above the ideograph U+8C9D BEI4
    (the character's radical). The unencoded element is actually an evolution of the
    ideograph U+673F CI4, which acts as the character's phonetic [see Karlgren's
    Grammata Serica #868]. (In the ideograph U+6BD2 DU2 "poisonous", the same
    unencoded element above the ideograph U+6BCB WU2 "not, without" is probably
    derived from the ideograph U+751F SHENG1 "life", the whole character being a
    rhebus for "not life").



    This archive was generated by hypermail 2.1.5 : Sat May 17 2003 - 07:20:12 EDT