From: Alain LaBontÚ (email@example.com)
Date: Mon Jul 12 2004 - 20:33:52 CDT
Resent with a non-renegade email address... (^8=
└ 14:10 2004-07-09, Jony Rosenne a Úcrit:
>I think the problem is with the concept of default in this case. The default
>should be the basis for a specific tailoring, and as a last resort for
>scripts and letters that do not have specific weights, but each
>implementation should have it's own weights when it matters. Only rarely is
>the default useful in itself, except possibly for Latin based locales.
[Alain] My two cents in this debate (in full support of this fundamental
statement of Jony): there is no concept of "default" in ISO/IEC 14651, the
International String Ordering Standard (by opposition to the UCA, this is a
significant difference), as, in order to be conformant, one * s h a l l
* declare a delta, even if it is only one line.
Adaptation to the world cultures (at the limit, even to individual
needs) is here the key.
And even for Latin-based locales, the UCA "default" makes no complete
sense for any Latin-script-written language in the world.
Given that there is no such thing as a default according to the
international standard, the debate is mostly futile in this context. It is
a debate which looks to me like the well known
That said, Peter Kirk raised an important issue (that *could* be solved
by applying a particular delta consistently):
>One Danish participant is S°ren Holst and so called in the name field of
>his e-mails, but signs himself "Soren" in messages in English. If I type
>"Soren" into the name search box (in Mozilla 1.7), I get no matches. This
>is not what I expect, because to me, and to S°ren himself when thinking in
>English, ° is a variant of o. (But actually Mozilla is inconsistent: when
>sorting it put S°ren after Sonny but before Soshie.)
[Alain] Mozilla (and for that purpose even "Find" in the most popular
Microsoft products, which of course have nothing to do with Mozilla) does
not seem to be smart enough to be *able* to "correctly" treat accented data
consistently between searching and sorting. Mozilla (or Microsof products)
does not do any accent decomposition for searching (and this is not an
expected behaviour in French for my name [LaBontÚ] either even if "Ú" is
but an accented instantiation of "e", and not a separate letter), and only
folds case (that's the best it seems to care doing).
It would be much better to make sorting, matching and searching
consistent with tailored tables of either the UCA or ISO/IEC 14651.
Unfortunately that is not what happens in most products, except in some
good search engines (Google, Altavista and the like, which are smart enough
for this -- but are not tailorable, to my knowledge -- and there are slight
differences in behaviour between Google and Altavista although it is very
much better that Mozilla or MS products in all cases).
There is probably a need for an international standard for searching
that would just say that: "searching should be consistent with sorting".
Sometimes international standards do not need to be complicated. Simple
ideas are great, but they seem intellectually so obvious that one would
have to write it 1000000 times in its homework book to get them applied and
fully understood (i.e. not only intellectually but in human-made tools as
This archive was generated by hypermail 2.1.5 : Mon Jul 12 2004 - 20:37:10 CDT