Re: how to sort by stroke (not radical/stroke)

From: John Jenkins (jenkins@apple.com)
Date: Tue May 13 2003 - 10:48:42 EDT

  • Next message: Mark Davis: "Re: visible glyphs for U+2062 and similar characters"

    On Tuesday, May 13, 2003, at 07:52 AM, Gary P. Grosso wrote:

    >
    > Our radical/stroke sort relies on the fact that unicode order is the
    > same as radical/stroke order.

    Actually, this is not quite true. Outside of the fact that the Han
    ideographs are spread out over three blocks, there are ambiguities in
    stroke-counting which can result in disagreements.

    The basic order of ideographs within a block is via the four-dictionary
    sorting algorithm, which closely approximates radical-stroke order but
    does vary in actual stroke counts from what would be generally used for
    traditional Chinese, simplified Chinese, Japanese, and Korean.

    The intent of the default order within Unihan is most emphatically
    *NOT* to provide an adequate or correct sort order for ideographs, but
    to provide a consistent, algorithmic way of assigning code points.
    Actual, real-life collation should use additional data, some of which
    can be found within Unihan.txt.

    > Stroke order, then, is something
    > different. Seems like we would need order entries in the config
    > data
    > for every character, which would be totally unmanageable.
    >
    > I didn't have any luck searching the Unicode web site for information
    > about sorting by stroke.
    >

    There is a kTotalStrokes field in Unihan.txt, although it doesn't cover
    every character in Unihan. This would definitely be a good place to
    start.

    Since characters with the same radical-stroke combination (usually) are
    found in a block, and since the radials (more often than not) have a
    consistent stroke count, it probably wouldn't be difficult to use some
    sort of data structure to hold this information more compactly than
    just using a straight table, but I haven't worked on the problem myself.

    ==========
    John H. Jenkins
    jenkins@apple.com
    jhjenkins@mac.com
    http://www.tejat.net/



    This archive was generated by hypermail 2.1.5 : Tue May 13 2003 - 11:43:58 EDT