Please look at the technical report, already referenced.
> You have a good point: .... does nu-alpha-tau-alpha-sigma-alpha
> spell "Natasa" or "Natasha"? The Greek letters given
> are obviously an attempt to write "Natasha" in Greek,
> but they romanize to "Natasa".
> And a, b, c, d, e, f, g, h, ... HATES a, i, u, e,
> o, ka, ki, ku, ...
> Maybe I should just capitalize everything (except
> Georgian? ... not that I have any Georgian CDs, or
> am likely to... I bet few things would be rarer than,
> say, a Georgian female rap CD in the US!!) and from
> there, just sort by codepoint number... no good,
> "Á" would come after "Z"...
> Would somebody PLEASE tell me, IN THE DEFAULT UNICODE
> COLLATION ALGORITHM, WHAT COMES AFTER WHAT?! I could
> use a list of Unicode characters in proper collation
> order, with "ties" labeled!!
> Robert Lozyniak
> Accusplit pedometer manufactures can go suck eggs
> My page: http://walk.to/11
> email@example.com - email
> (917) 421-3909 x1133 - voicemail/fax
> ---- Antoine Leca <Antoine.Leca@renault.fr> wrote:
> > Robert Lozyniak wrote:
> > >
> > > How do you sort text with some in Roman and some
> > > in non-Roman alphabets?
> > I never sort texts, only lists of items (words,
> > names, titles, whatever).
> > Depending of the ratios, I see two main solutions:
> > - if Latin is the most current, _and_ only other
> > Greek-
> > derived scripts are used, _and_ the intended audience
> > is proficient enough, I may interspeed the non-Roman
> > letters as if all the Greek-derived alphabets shared
> > a common order (so Greek alpha sorts just after
> > Latin a,
> > Cyrillic ve after Cyrillic be which follows Greek
> > beta
> > which follows Latin b, Greek xi after the o's and
> > before
> > the p's, etc.)
> > - in other cases, I sort the scripts separately.
> > > Currently, I'm just romanizing
> > > everything but I don't know if that is that good.
> > Hmmm. I won't do that. It would take me much too
> > long
> > to find something that begin with beta at the V
> > section,
> > while something that begin with mu+pi at the B
> > section...
> > For Cyrillic, I expect U+0427 to romanize as tcha,
> > and U+0429 as chtcha, and I am not sure you will
> > (or
> > vice-versa).
> > Things are different if you actually translitterate,
> > i.e. if the items are presented in Latin script.
> > > It is probably bad to kanize digits, because
> > they
> > > would sort 1, 9, 5, and so on, or some other
> > mixed-up
> > > order.
> > It is always a problem to sort the digits, anyway.
> > Since they are usually ony a few of them, I believe
> > the
> > best place is the foremost, so the search does
> > not takes
> > too long. But if they are more than a bunch, that
> > is
> > pretty always a brain damage.
> > Antoine
> Get your own FREE Bolt Onebox - FREE voicemail, email, and
> fax, all in one place - sign up at http://www.bolt.com
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:07 EDT