Re: The mother of all collation schemes

From: Keld Jørn Simonsen (keld@dkuug.dk)
Date: Thu Jun 15 2000 - 18:00:28 EDT


There is a new ISO standard coming out for a default collation,
namely ISO 14651, and a Unicode technical report too, which
should be equivalent technically. This should also be apllicable
to subsets of 10646, like the one you are indication (which I
read as 8859-1-ish). Nowadays I would recommend 10646/Unicode
for new implementations, especially if you also want to address
East-Asian markets.

Keld

On Thu, Jun 15, 2000 at 12:11:14PM -0800, rampshot@usa.net wrote:
> I am trying to think of a collation scheme for the purpose of ordering a set
> of CDs. Let's say you have CD titles you want to order. They are in different
> languages, with a few accented letters, and even some non-Roman letters.
>
> 1) Romanise all non-roman names. For Japanese, I'd use "fu" and "chi" and
> "shi" and "tsu" and DEFINITELY indicate long vowels (so Tokyo would come out
> as "Toukyou").
> 2) My alphabetical order: (digits are treated as letters):
> [sp] [other punc.] 0 1 2 3 4 5 6 7 8 9 A ? ? ? B C ? D E ? ? ? F G H ? ? ? J K
> L M N ? O ? ? ? P Q R S T U ? ? ? V W X Y ?(why couldn't I find this in
> uppercase?) Z
> The reason digits are treated as letters is so "97" will come before "98".
> I'm not sure how to treat names like "Ranma 1/2". Any ideas? Also, this system
> is very sensitive to things such as misspelling "DJ" as "D.J."
> Does anyone have any ideas for ordering punctuation?
> Of course, if it was just anime CDs the order would be 0 1 2 3 4 5 6 7 8 9 a i
> u e o ka ki ku ke ko sa shi su, etc.
>
>
> ____________________________________________________________________
> Get free email and a permanent address at http://www.netaddress.com/?N=1



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT