RE: how to sort by stroke (not radical/stroke)

From: Marco Cimarosti (
Date: Thu May 15 2003 - 05:43:31 EDT

  • Next message: Pim Blokland: "Re: weird UTF-8 encoding in MS Exchange 2000 IM client"

    John Jenkins wrote:
    > There is a kTotalStrokes field in Unihan.txt, although it
    > doesn't cover every character in Unihan. This would
    > definitely be a good place to start.

    Not so good. What Gary needs is the *sequence* of all strokes composing each
    character. Once he has that data, the total number of strokes from each
    character is simply the length of each sequence.

    A better starting point would be a database of IDS decompositions of CJK
    ideographs. E.g.:

            (DB#1: IDS decompositions)
            喻 = ⿰ 口 ⿱ ⿱ 人 一 ⿰ 月 刂
            U+55BB = LeftRight(MOUTH, TopBottom(TopBottom(MAN, ONE),
    LeftRight(MOON, KNIFE))

    Once you have that, building a strokes database is quite trivial. First, all
    the IDS operators are useless for this purpose and should be stripped off:

            (DB#2: Decompositions in atomic components)
            喻 = 口 人 一 月 刂
            U+55BB = { MOUTH, MAN, ONE, MOON, KNIFE }

    Then, a database of strokes for all the atomic components is needed. This
    should not such a huge work, because only a few hundreds such components are
    supposed to exist:

            (DB#3: Stroke sequences of atomic components)

            口 = 丨 乙 一
            MOUTH = { shu, zhe, heng }

            人 = 丿 丶
            MAN = { pie, na }

            一 = 一
            ONE = { heng }

            月 = 丿 乙 一 一
            MOON = { pie, zhe, heng, heng }

            刂 = 丨 亅
            KNIFE = { shu, shugou }

    At this point, it is easy to automatically expand the components of DB#2 to
    the corresponding stroke sequences of DB#3:

            (DB#4: CJK stroke sequences)
            喻 = 丨 乙 一 丿 丶 一 丿 乙 一 一 丨 亅
            U+55BB = { shu, zhe, heng, pie, na, heng, pie, zhe, heng, heng,
    shu, shugou }

    DB#1 would be useful for a number of purposes, but building it is a pain in
    the neck! (Just to be 100% clear, I'd like having it, but I am *not*
    volunteering to do it. :-)

    _ Marco

    This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 06:24:11 EDT