RE: how to sort by stroke (not radical/stroke)

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Thu May 15 2003 - 06:50:37 EDT

  • Next message: Marco Cimarosti: "RE: how to sort by stroke (not radical/stroke)"

    On Thu, 15 May 2003 11:43:31 +0200, Marco Cimarosti wrote:

    > Not so good. What Gary needs is the *sequence* of all strokes composing each
    > character. Once he has that data, the total number of strokes from each
    > character is simply the length of each sequence.

    I'm not sure that's what he wants either. In dictionaries that give a Stroke
    Order index, strokes are usually sub-sorted by the stroke category of the first
    one or two strokes of the character. Whilst you can get this information from a
    sequence of all strokes, that is more than is needed.

    > A better starting point would be a database of IDS decompositions of CJK
    > ideographs.

    ...

    > DB#1 would be useful for a number of purposes, but building it is a pain in
    > the neck! (Just to be 100% clear, I'd like having it, but I am *not*
    > volunteering to do it. :-)

    Coincidentally I've recently been in contact with someone who has spent the last
    ten years creating a database of CJK ideographs, similar in scope to the Unihan
    database, but (according to him) more systematic and accurate. His database does
    include ideographic decompositions (as well as stroke categorization of the
    first two strokes of each character). The main problem with ideographic
    decompositions is that not all discrete ideographic components are [currently]
    encoded within Unicode - there are about 100 unencoded ideographic components
    according to this person. Of course you could get around them by breaking them
    down directly into their component strokes, but this would be an inelegant
    solution.

    Andrew

    P.S. If anyone is interested in cooperating with this person, please contact me
    off-list.



    This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 07:37:15 EDT