Re: how to sort by stroke (not radical/stroke)

From: Allen Haaheim (haaheima@interchange.ubc.ca)
Date: Thu May 15 2003 - 18:36:49 EDT

  • Next message: Mark Davis: "Proposed Update of UTR #18: Unicode Regular Expressions"

    Andrew wrote:

    >In dictionaries that give a Stroke Order index, strokes are usually
    sub-sorted by the stroke category of the first one or two strokes of the
    character.

    In indexes ordered by stroke count, the sub-sort is more often by radical
    than first stroke(s). The only dictionary I have at home that sub-sorts by
    first stroke(s) is _Cihai_.

    Marco wrote:

    >So, considering only the first two strokes of each >character, would result
    in big groups of characters being sorted randomly

    These groups need not be random. The 1989 _Cihai_'s "First-two-strokes"
    index is sub-sorted by radical, ending up with sub-groupings of only five or
    ten characters on average, so that these resultant groups are actually more
    tightly organized (though not made explicit with headings) than
    radical/stroke tables. Even stopping at the groups of characters listed
    under their first-two-stroke headings yields groups of characters no larger
    than the groups of truly randomly-ordered characters of a radical/stroke
    index. For example, compare 1989 _Cihai's_ first-two-strokes/radical table
    to the radical/stroke table in a dictionary of comparable size, _Hanyu da
    zidian_.

    This being said, I am not doubting that radical/stroke is (for the
    initiated) the fastest, most convenient, most commonly found and most
    commonly used method, whereas stroke/radical (not stroke alone) is used as
    the next alternative when radical/stroke fails to yield the character
    (usually when the radical is unclear or guessed wrong).

    Regards,

    Allen Haaheim

    ----- Original Message -----
    From: "Marco Cimarosti" <marco.cimarosti@essetre.it>
    To: "'Andrew C. West'" <andrewcwest@alumni.princeton.edu>;
    <unicode@unicode.org>
    Sent: Thursday, May 15, 2003 4:54 AM
    Subject: RE: how to sort by stroke (not radical/stroke)

    > Andrew C. West wrote:
    > > [...]
    > > I'm not sure that's what he wants either. In dictionaries
    > > that give a Stroke Order index, strokes are usually
    > > sub-sorted by the stroke category of the first
    > > one or two strokes of the character.
    >
    > I doubt that this would be sufficient in all cases. The radical are often
    > the first (left, top) component of a character, and most radicals have
    many
    > more than two strokes. So, considering only the first two strokes of each
    > character, would result in big groups of characters being sorted randomly
    > (i.e., all those character whose radical is bigger than two strokes and
    > whose residual stroke count is the same).
    >
    > > [...]
    > > Coincidentally I've recently been in contact with someone who
    > > has spent the last ten years creating a database of CJK
    > > ideographs,
    > > [...]
    > > first two strokes of each character). The main problem with
    > > ideographic decompositions is that not all discrete
    > > ideographic components are [currently] encoded within Unicode
    > > - there are about 100 unencoded ideographic components
    > > according to this person.
    >
    > Does this also include the (relatively new) "CJK components" block? If
    yes,
    > it might be worth filling in a proposal to add those components, in order
    to
    > complete the IDS sub-system.
    >
    > _ Marco
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 19:10:44 EDT