Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

From: Mark Davis ☕ (mark@macchiato.com)
Date: Fri Jul 30 2010 - 13:39:07 CDT

  • Next message: Kenneth Whistler: "Re: CSUR Tonal"

    A few items on the UTN that I didn't notice previously, and one for UCA.A.
    2.3. Topic List, Order 3
    It is not just ICU; CLDR/LDML sets the default for alternates to *
    non-ignorable*, which means that probably most implementations of UCA will
    be non-ignorable. This is out-of-the-box, so those implementations can reset
    the default globally, or for a given locale, or for a given tailoring of a
    locale, to *shifted.*
    *
    *
    *So I'd suggest changing:*
    *
    *
    *First, let's consider the default settings for the UCA implementation by
    the International Components for Unicode library [ICU <#ICU>]. That library
    does a full UCA multi-level collation. Its default settings differ from the
    defaults for UCA per se, in that ICU does not default to the "shifted"
    option for weighting. *That means that the so-called variable elements
    (e.g., punctuation and symbols) are given primary weights, instead of being
    shifted to a weighting significance at the fourth level. Given the ICU
    default settings, the list would order as follows.
    *
    *
    *to*
    *
    *
    *First, let's consider the default settings for the UCA implementation by
    the International Components for Unicode library [ICU <#ICU>]. That library
    does a full UCA multi-level collation, using the LDML locale tailorings. The
    default settings for LDML differ from the defaults for UCA per se, in that
    LDML defaults to the "non-ignorable" option, not "shifted". *Implementations
    can, however, reset the default globally, or for a given locale, or for a
    given tailoring of a locale, to *shifted. *That means that the so-called
    variable elements (e.g., punctuation and symbols) are given primary weights,
    instead of being shifted to a weighting significance at the fourth level.
    Given the ICU default settings for the root locale, the list would order as
    follows.
    *
    *
    *
    *
    B. I also noticed a significant typo in
    http://www.unicode.org/reports/tr10/proposed.html.
    *
    *
    *Sets alternate handling for variable weights, as described in Section
    3.6.2,Variable Weighting <#Variable_Weighting>. Note that in [LDML <#LDML>
    ], blanked is not supported, and shifted is the default.*

    it should be:

    Sets alternate handling for variable weights, as described in *Section
    3.6.2,Variable Weighting <#Variable_Weighting>*. Note that in [LDML <#LDML>
    ], *blanked* is not supported, and *non-ignorable* is the default.
    Implementations of LDML can, however, reset the default globally, or for a
    given locale, or for a given tailoring of a locale, to *shifted.*
    *
    *
    [There wouldn't be a need to contrast the default for LDML if it were the
    same as UCA.]

    Mark

    *— Il meglio è l’inimico del bene —*

    2010/7/30 Frédéric Grosshans <frederic.grosshans@m4x.org>

    > Le vendredi 30 juillet 2010 à 08:36 -0700, Kenneth Whistler a écrit :
    > > I suspect that many French users would be utterly unable to
    > > tell a "correct" ordering of all the modèle, modelé words
    > > from an "incorrect" one, or would frankly much care in practice,
    > > as long as they could find what they were looking for in the list.
    >
    > I agree with you on this, and I would like to see a real-life example
    > (in wikipedia or wiktionnary for example) where it should matters.
    >
    > However, there is an order which is "obviously incorrect" for a french
    > speaker, to the point that its sends the things to the place where they
    > are unfindable : the binary order, currently used by Wikipedia, where
    > a<e<z<è. For a french (or at least for me), separating e form é and è
    > is similar (i.e. as unintuitive) as separating e and E.
    >
    > This is a common problem for me (I often struggle to find a file with an
    > accent on my computer, because I tend to forget that z<é), and I think
    > an example obviously showing it would be nice.
    >
    > If you look at the list
    > http://fr.wikipedia.org/w/index.php?title=Sp%C3%A9cial%3AToutes+les
    > +pages&from=Modele&to=Mod%C3%A8ne&namespace=0
    >
    > you will see an order like :
    >
    >
    > ...
    > Modele atomique de Thomson
    > Modele bio-psycho-social
    > Modele christallerien
    > Modele cognitif
    > Modele conceptuel des traitements
    > ...
    > A very long interval, going through things like
    > Modification
    > ...
    > Modulation
    > ....
    > Module
    > ...
    > Modèle atomique de Thomson
    > Modèle binomial
    > Modèle bio-psycho-social
    > Modèle black-scholes
    > Modèle booléen
    > Modèle christallérien
    > Modèle climatique
    > Modèle cognitif
    >
    >
    > while my intuition would bring the modèle and modele together. I guess
    > it's the order 2.3 of your technical note (but I'm not sure). I think
    > the order 2.2 would still keep e<u<è, which remains strange and close to
    > unusable.
    >
    > Frédéric Grosshans
    >
    > PS: However, I agree that the words fleur de lys, fleur-de-lys, fleur de
    > lis are a particularly nice example to illustrate a topic on french ;-)
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Jul 30 2010 - 13:41:36 CDT