Re: Sorting words in latin based languages

From: Alain LaBont\i\ (alb@sct.gouv.qc.ca)
Date: Fri Jan 08 1999 - 09:59:50 EST


A 04:00 99-01-08 -0800, William Overington a écrit :
>The alphabet used by the Esperanto language has 28 letters, namely, in
>order, a, b, c, c circumflex, d, e, f, g, g circumflex, h, h circumflex, i,
>j, j circumflex, k, l, m, n, o, p, r, s, s circumflex, t, u, u breve, v, z.
>
>All of these characters can be encoded in unicode so as to produce
>displayable text.
>
>However, the numerical order of the numerical values of the code elements is
>not the same order as the order of the characters in the alphabet.

[Alain] Sorting, ordering, string comparison and searching should *never*
be blindly tied to character coding, it never works. No code is suitable to
sort any language, including English (just think about the separation
between upper and lowe case). This is the realm of sort specifications
which transform a coded character string to be sorted into a
numerically-sortable object. Unicode made specs to that effect and so did
ISO (ISO/IEC 14651, at the second-FCD stage [standardese dialect!] of
international standard development). Both are gladly converging, more or
less [imho more more than less, end-user-wise]. Delta declarations
(deviations, generally slight, relative to te international template
proposed, derived mainly from Unicode data) will be mandatory for
conformance to the above-quoted international standard. This project is
currently under international ballot until April. Esperanto is imho very
well deserved in the template and my Esperanto-French-lexicon order will
fit within its implementation.

Saluton.

Alain LaBonté
Kebeco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT