Re: The mother of all collation schemes

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Sun Jun 18 2000 - 23:45:06 EDT


At 12:48 PM -0800 6/15/00, Tex Texin wrote:
>However, 10 comes before 2...

There is a wonderful passage in the classic fantasy novel "Little,
Big" in which a worker at the telephone company is trying to restore
various overabbreviated addresses. The classic problem is, of course,
'St.' which is the abbreviation for both 'Saint' and 'Street'. The
worker is confounded by 'Fifth St. Bar & Grill' and 'Church of all
Sts.' It turns out that they are the Fifth Saint Bar & Grill, and the
Church of All Streets.

See also Knuth, The Art of Computer Programming, First Edition, Vol.
3, Sorting and Searching, pp. 7-9, Exercise 16 [33]. He gives a
subset of one of the sets of rules for library card catalog sorting
of multilingual titles in transliteration, with an illustration of
each rule.

>rampshot@usa.net wrote:
> >
> > I am trying to think of a collation scheme for the purpose of
>ordering a set
> > of CDs. Let's say you have CD titles you want to order. They are
>in different
> > languages, with a few accented letters, and even some non-Roman letters.
> >
> > 1) Romanise all non-roman names. For Japanese, I'd use "fu" and "chi" and
> > "shi" and "tsu" and DEFINITELY indicate long vowels (so Tokyo
>would come out
> > as "Toukyou").
> > 2) My alphabetical order: (digits are treated as letters):
> > [sp] [other punc.] 0 1 2 3 4 5 6 7 8 9 A Á Ä À B C Ç D E É Ë È F
>G H Í Ï Ì J K
> > L M N Ñ O Ó Ö Ò P Q R S T U Ú Ü Ù V W X Y ÿ(why couldn't I find this in
> > uppercase?) Z
> > The reason digits are treated as letters is so "97" will come before "98".
> > I'm not sure how to treat names like "Ranma 1/2". Any ideas?
>Also, this system
> > is very sensitive to things such as misspelling "DJ" as "D.J."
> > Does anyone have any ideas for ordering punctuation?
> > Of course, if it was just anime CDs the order would be 0 1 2 3 4
>5 6 7 8 9 a i
> > u e o ka ki ku ke ko sa shi su, etc.
> >
> > ____________________________________________________________________
> > Get free email and a permanent address at http://www.netaddress.com/?N=1
>
>--
>---------------------------------------------------------------------
>---------------------------
>Tex Texin Director, International Products
>
>Progress Software Corp. +1-781-280-4271
>14 Oak Park +1-781-280-4655 (Fax)
>Bedford, MA 01730 USA texin@bedford.progress.com
>
>http://www.progress.com The #1 Embedded Database
>http://www.SonicMQ.com JMS Compliant Messaging- Best Middleware
>Award
>http://www.aspconnections.com Leading provider in the ASP marketplace
>
>Progress Globalization Program (New URL)
>http://www.progress.com/partners/globalization.htm
>---------------------------------------------------------------------
>---------------------------
>Come to the Panel on Open Source Approaches to Unicode Libraries at
>the Sept. Unicode Conference
>http://www.unicode.org/iuc/iuc17

Edward Cherlin
Generalist
"A knot!" exclaimed Alice. "Oh, do let me help to undo it."
Alice in Wonderland



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT