Re: how to sort by stroke (not radical/stroke)

From: Dan Kogai (dankogai@dan.co.jp)
Date: Tue May 13 2003 - 15:57:53 EDT

Next message: Gary P. Grosso: "Re: how to sort by stroke (not radical/stroke)"

Previous message: Dan Kogai: "Re: how to sort by stroke (not radical/stroke)"
In reply to: Andrew C. West: "Re: how to sort by stroke (not radical/stroke)"
Next in thread: Pim Blokland: "Re: how to sort by stroke (not radical/stroke)"
Reply: Pim Blokland: "Re: how to sort by stroke (not radical/stroke)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wednesday, May 14, 2003, at 01:23 AM, Andrew C. West wrote:
> That's certainly true, but sorting by Unicode code point will be 90%
> OK for the
> 99.99% of CJK data that is encoded within the basic CJK block (and at
> the
> radical level it'll probably be 99.9% OK). As a rough and ready method
> of
> sorting CJK data it's definitely the most cost effective way of
> implementing a
> CJK sort. Like I said, it all depends on what you want it for.

I wrote a small perl script to see if that is correct.

#!/usr/local/bin/perl
use strict;
use Unicode::Unihan; # get one via CPAN
my $uh = Unicode::Unihan->new;
binmode STDOUT => ':utf8';
for my $ord (0..65535){ # just check BMP
     my $chr = chr($ord);
     my $rs = $uh->RSUnicode($chr);
     defined $rs or next;
     printf "$chr (U+%04x) => $rs\n", $ord;
}
__END__

And here is the part of what it prints.

㐀 (U+3400) => 1.4
㐁 (U+3401) => 1.5
㐂 (U+3402) => 1.5
㐃 (U+3403) => 2.2
[snip]
䶵 (U+4db5) => 214.10
一 (U+4e00) => 1.0
丁 (U+4e01) => 1.1

For U+3400 - U+4DD5 you are roughly right but at U+4E00, "One", the
simplest of all ideographs, rewinds the "stroke counter". So I have to
say sorting by Unicode code point to approximate radical/stroke sorting
is very moot.

Sorting by code point to yield dictionary order seems a luxury only
ASCII enjoys. Even ISO-8859-1 fails miserably since all diacritics are
\x80 and above.

Dan the Unsorted Man

Next message: Gary P. Grosso: "Re: how to sort by stroke (not radical/stroke)"
Previous message: Dan Kogai: "Re: how to sort by stroke (not radical/stroke)"
In reply to: Andrew C. West: "Re: how to sort by stroke (not radical/stroke)"
Next in thread: Pim Blokland: "Re: how to sort by stroke (not radical/stroke)"
Reply: Pim Blokland: "Re: how to sort by stroke (not radical/stroke)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue May 13 2003 - 17:02:17 EDT