RE: Changing UCA primar[l]y weights (bad idea)

From: Alain LaBonté (alb@sct1.gouv.qc.ca)
Date: Mon Jul 12 2004 - 20:33:52 CDT

  • Next message: Philippe Verdy: "Re: DUCET and supplementary foldings (was: Looking for transcription or transliteration standards latin- >arabic)"

    Resent with a non-renegade email address... (^8=

    À 14:10 2004-07-09, Jony Rosenne a écrit:
    >I think the problem is with the concept of default in this case. The default
    >should be the basis for a specific tailoring, and as a last resort for
    >scripts and letters that do not have specific weights, but each
    >implementation should have it's own weights when it matters. Only rarely is
    >the default useful in itself, except possibly for Latin based locales.

    [Alain] My two cents in this debate (in full support of this fundamental
    statement of Jony): there is no concept of "default" in ISO/IEC 14651, the
    International String Ordering Standard (by opposition to the UCA, this is a
    significant difference), as, in order to be conformant, one * s h a l l
    * declare a delta, even if it is only one line.

        Adaptation to the world cultures (at the limit, even to individual
    needs) is here the key.

        And even for Latin-based locales, the UCA "default" makes no complete
    sense for any Latin-script-written language in the world.

        Given that there is no such thing as a default according to the
    international standard, the debate is mostly futile in this context. It is
    a debate which looks to me like the well known
    "my-father-is-stronger-than-yours" debate.

        That said, Peter Kirk raised an important issue (that *could* be solved
    by applying a particular delta consistently):
    >One Danish participant is Søren Holst and so called in the name field of
    >his e-mails, but signs himself "Soren" in messages in English. If I type
    >"Soren" into the name search box (in Mozilla 1.7), I get no matches. This
    >is not what I expect, because to me, and to Søren himself when thinking in
    >English, ø is a variant of o. (But actually Mozilla is inconsistent: when
    >sorting it put Søren after Sonny but before Soshie.)

    [Alain] Mozilla (and for that purpose even "Find" in the most popular
    Microsoft products, which of course have nothing to do with Mozilla) does
    not seem to be smart enough to be *able* to "correctly" treat accented data
    consistently between searching and sorting. Mozilla (or Microsof products)
    does not do any accent decomposition for searching (and this is not an
    expected behaviour in French for my name [LaBonté] either even if "é" is
    but an accented instantiation of "e", and not a separate letter), and only
    folds case (that's the best it seems to care doing).

        It would be much better to make sorting, matching and searching
    consistent with tailored tables of either the UCA or ISO/IEC 14651.
    Unfortunately that is not what happens in most products, except in some
    good search engines (Google, Altavista and the like, which are smart enough
    for this -- but are not tailorable, to my knowledge -- and there are slight
    differences in behaviour between Google and Altavista although it is very
    much better that Mozilla or MS products in all cases).

        There is probably a need for an international standard for searching
    that would just say that: "searching should be consistent with sorting".
    Sometimes international standards do not need to be complicated. Simple
    ideas are great, but they seem intellectually so obvious that one would
    have to write it 1000000 times in its homework book to get them applied and
    fully understood (i.e. not only intellectually but in human-made tools as
    well).

    Alain LaBonté
    Québec



    This archive was generated by hypermail 2.1.5 : Mon Jul 12 2004 - 20:37:10 CDT