Mixing alphabets (was: sorting my CD collection)

From: 11digitboy@bolt.com
Date: Thu Aug 10 2000 - 16:06:28 EDT


You have a good point: .... does nu-alpha-tau-alpha-sigma-alpha
spell "Natasa" or "Natasha"? The Greek letters given
are obviously an attempt to write "Natasha" in Greek,
but they romanize to "Natasa".

And a, b, c, d, e, f, g, h, ... HATES a, i, u, e,
o, ka, ki, ku, ...

Maybe I should just capitalize everything (except
Georgian? ... not that I have any Georgian CDs, or
am likely to... I bet few things would be rarer than,
say, a Georgian female rap CD in the US!!) and from
there, just sort by codepoint number... no good,
"Á" would come after "Z"...

Would somebody PLEASE tell me, IN THE DEFAULT UNICODE
COLLATION ALGORITHM, WHAT COMES AFTER WHAT?! I could
use a list of Unicode characters in proper collation
order, with "ties" labeled!!

--
Robert Lozyniak
Accusplit pedometer manufactures can go suck eggs
My page: http://walk.to/11
11digitboy@bolt.com - email
(917) 421-3909 x1133 - voicemail/fax

---- Antoine Leca <Antoine.Leca@renault.fr> wrote: > Robert Lozyniak wrote: > > > > How do you sort text with some in Roman and some > > in non-Roman alphabets? > > I never sort texts, only lists of items (words, > names, titles, whatever). > > Depending of the ratios, I see two main solutions: > > - if Latin is the most current, _and_ only other > Greek- > derived scripts are used, _and_ the intended audience > is proficient enough, I may interspeed the non-Roman > letters as if all the Greek-derived alphabets shared > a common order (so Greek alpha sorts just after > Latin a, > Cyrillic ve after Cyrillic be which follows Greek > beta > which follows Latin b, Greek xi after the o's and > before > the p's, etc.) > > - in other cases, I sort the scripts separately. > > > > Currently, I'm just romanizing > > everything but I don't know if that is that good. > > Hmmm. I won't do that. It would take me much too > long > to find something that begin with beta at the V > section, > while something that begin with mu+pi at the B > section... > For Cyrillic, I expect U+0427 to romanize as tcha, > and U+0429 as chtcha, and I am not sure you will > (or > vice-versa). > > Things are different if you actually translitterate, > i.e. if the items are presented in Latin script. > > > > It is probably bad to kanize digits, because > they > > would sort 1, 9, 5, and so on, or some other > mixed-up > > order. > > It is always a problem to sort the digits, anyway. > Since they are usually ony a few of them, I believe > the > best place is the foremost, so the search does > not takes > too long. But if they are more than a bunch, that > is > pretty always a brain damage. > > > Antoine >

___________________________________________________________________ Get your own FREE Bolt Onebox - FREE voicemail, email, and fax, all in one place - sign up at http://www.bolt.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT