Re: Linguistic precedence [was: (TC304.2313) AND/OR: antediluvian

From: John Cowan (jcowan@reutershealth.com)
Date: Thu Jun 15 2000 - 12:42:30 EDT


jarkko.hietaniemi@nokia.com wrote:

> But believing that there is a collation order that works across all the
> European (Latin script, let's not even go to Cyrillic and Greek) languages
> is a very hopeless fallacy:

Quite true. But there is a *default* collation that works *fairly* well,
plus machinery for tailoring it to particular cases: see
http://www.unicode.org/unicode/reports/tr10/

Note that collation is user-locale-specific, not language-specific:
as an anglophone browsing a list of Swedish personal names, I want them
collated in English order (ignore accents), not Swedish order.

> the things conflicting are the 'accented' characters (like
> a-diaereses and o-diaereses in German versus in Swedish/Finnish), and special
> 'ligature'-like cases like the 'll' and 'ch' of Spanish, and pairs like v/w and
> i/j being sorted "to the same place", and so on.

This conflates two separate issues: tailoring for localization, and handling
multiple characters as single. Both are well handled by the collation TR.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT