Re: accented Latin characters sort order, non-language dependant

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Jul 10 2006 - 07:23:08 CDT

Next message: Andreas Prilop: "Re: accented Latin characters sort order, non-language dependant"

Previous message: Cristian Secară: "accented Latin characters sort order, non-language dependant"
In reply to: Cristian Secară: "accented Latin characters sort order, non-language dependant"
Next in thread: Andreas Prilop: "Re: accented Latin characters sort order, non-language dependant"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Mon, 10 Jul 2006, Cristian Secar wrote:

> I have to make an spreadsheet with a few accented characters and their
> coverage for a few languages. How do I sort them alphabetically ?

Using the Unicode Collation Algorithm,
http://www.unicode.org/reports/tr30/
would appear to be suitable here, since the context is multilingual.

In practice, if you have just a few characters, you could check their
mutual order from
http://www.unicode.org/charts/collation/

> I know that this is highly language dependant, but I also remember that
> once I've been told about an (Unicode ?) document with an abstract sort
> order of many (Latin ?) characters. I cannot remember what document that
> was - is this something [well] known ?

The algorithm is a separate standard issued by the Unicode Consortium, and
it can be used either as such (typically, in multilingual contexts) or as
"lowest level algorithm", possibly with several layers of locale-specific
rules above it. If you have data in some particular language, with a few
words with foreign characters, you could use the sorting rules of that
language as the "higher level" algorithm, falling back to the Unicode
Collation Algorithm for characters not covered by it.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Next message: Andreas Prilop: "Re: accented Latin characters sort order, non-language dependant"
Previous message: Cristian Secară: "accented Latin characters sort order, non-language dependant"
In reply to: Cristian Secară: "accented Latin characters sort order, non-language dependant"
Next in thread: Andreas Prilop: "Re: accented Latin characters sort order, non-language dependant"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 10 2006 - 07:29:31 CDT