Re: Changing UCA primarly weights (bad idea)

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Jul 09 2004 - 19:34:26 CDT

  • Next message: Mark Davis: "Re: Changing UCA primary weights (bad idea)"

    I'll try to pick out the relevant points.

    > Please do. Do you really want all those letters
    > between "e" and "f" interfiled with "e"? I surely
    > do not.

    You seem to have a misperception of what I think we should be looking at.
    What I think we should be examining is which of the items that are not
    interfiled (to use your phrasing) should be, if any. I don't think
    everything should be. In particular, I think John's list is the list we
    should be focusing on.

    > John's list?

    That's was in my original mail, that you were commenting on when you changed
    the subject line, but which you didn't apparently didn't bother to actually
    read. Here is the text:

    >> If you look at John's suggested file for diacritic
    >> folding(http://www.ccil.org/~cowan/DiacriticFolding.txt), there are quite
    a
    >> number that are not reflected in the UCA.

    > My point is made here. It is really only in
    > initial position where this is likely to be
    > noticed.

    This is incorrect. It will make a difference in other positions. Sorting
    "Søren" after "Sozar" in a long list, if someone isn't expecting it, will
    cause problems. They look for it after "Soret", don't see it on the page,
    and assume it isn't there; fooled by the fact that it is on a completely
    different page.

    Remember that the collation sequence is also used for language-sensitive
    matching as well as sorting.

    > What I want is the status quo, however.
    > Leave the template and its principles alone.

    Stability is important, and we want to consider that very carefully before
    making any change. However, I believe that the current way we handle a few
    characters in UCA is distinctly suboptimal, and worth considering.

    ‎Mark

    ----- Original Message -----
    From: "Michael Everson" <everson@evertype.com>
    To: <unicode@unicode.org>
    Sent: Friday, July 09, 2004 13:25
    Subject: Re: Changing UCA primarly weights (bad idea)

    > Mark, your examples are all of the
    > run-of-the-mill Scandinavian variety. Trotting
    > out Polish and Danish doesn't address the issue.
    > The issue is all the phonetic characters, and
    > all the African ones (for instance).
    >
    > > > 1) it destabilizes the default tailorable template of ISO/IEC 14651
    > > > and the UCA which has been published for some time. Anyone who *has*
    > > > tailored it would have to do all that work all over again.
    > >
    > >You are certainly right that this is not a slam-dunk;
    >
    > This noun must have been on TV a lot in the US
    > recently; I have seen it a lot but it remains
    > obscure, apart from being a basketball reference.
    > What does it mean? That I am right that the
    > proposal is not a shoo-in? Or, indeed, that I am
    > right that it is not a foregone conclusion that
    > the proposal will be accepted?
    >
    > >there are reasons for
    > >and against it. And it may well be that the committee decides against it.
    >
    > There are two templates, which are synchronized,
    > and decided about by two committees.
    >
    > >What we actually did was to put similar letters
    > >near other letters, *and if their decompositions
    > >were the same* we interfiled them.
    >
    > I remember. I was on the committee that helped to decide these things.
    >
    > >There is, however, little principled difference
    > >between Å, ¸ , ¼ , Ñ, Ø, ?, and Ô that would
    > >cause a user to think that the some should be
    > >interfiled and some should not. In some
    > >languages these would be seen as "separate
    > >letters" (e.g. with different primary weights)
    > >and in others not; but that does not line up in
    > >any particular way with what is in the UCA. (see
    > >also comment below).
    >
    > Those aren't the ones I'm worried about, and they
    > are not much of a problem. We had principles for
    > determining "basic letters" and those are what we
    > used; what I see now is a proposal to change that.
    >
    > >See http://www.unicode.org/charts/collation/chart_Latin.html for many
    other
    > >cases.
    >
    > Please do. Do you really want all those letters
    > between "e" and "f" interfiled with "e"? I surely
    > do not.
    >
    > > > 3) in discussions elsewhere, Mark has talked about what "most users"
    > >> "expect" and I found his suggestion to be anglocentric and
    > >> unsubstantiated.
    > >
    > >And I will refrain from saying what I think of your reasoning ability in
    > >general, although circularity seems to be a particular specialty.
    >
    > Sweet of you to say.
    >
    > >I suggest that we stick to the facts instead of ad hominem attacks.
    >
    > Calling a thing "ad hominem" doesn't make it ad
    > hominem. It is your suggestion which I
    > criticized, because it seems very A-to-Z and
    > alien to the principles which have been in the
    > template until now.
    >
    > >For user expectations, check out how foreign words with unusual accents
    are
    > >sorted in a variety of languages. I have seen no reason to believe that
    > >Germans or French or others behave much differently when faced with a
    letter
    > >like ø that is not one that they use. The key is whether they would
    expect
    > >to see:
    > >
    > >a) Interleaved:
    > >..oa..
    > >..øb..
    > >..oz..
    >
    > You can tailor for this now.
    >
    > >b) Separate but near:
    > >..oz..
    > >..øb..
    > >..pa..
    >
    > This is what we have now.
    >
    > >c) Like a particular language (Danish)
    > >..yb..
    > >..øb..
    >
    > You can tailor for this now.
    >
    > My point is made here. It is really only in
    > initial position where this is likely to be
    > noticed. What I want is the status quo, however.
    > Leave the template and its principles alone.
    >
    > >a) Interleaved:
    > >..oa..
    > >..öb..
    > >..oz..
    >
    > This is what we have now.
    >
    > >b) Separate but near:
    > >..oz..
    > >..öb..
    > >..pa..
    >
    > You can tailor for this now.
    >
    > >c) Like a particular language (Swedish or Phonebook German)
    > >..yb..
    > >..öb..
    > >
    > >..od..
    > >..öz..
    > >..of..
    >
    > You can tailor for this now.
    >
    > >More accurately, you believe that the correct behavior occurs.
    >
    > It is correct for most of the letters which would
    > be affected by the change you propose. The
    > overwhelming majority of the
    > letters-without-diacritics which occur between
    > the "main A-Z letters" are correctly filed that
    > way, and would be incorrectly filed if interfiled
    > with the "main" letters. Is there a discomfort in
    > what happens between Ø/Ö? Well, that's an
    > anomaly, right enough but it is well-known and
    > can easily be tailored for anyone worried about
    > it. Lumping all the Engs with N or all the Schwas
    > with E, however, would have only the effect of
    > making a working template cease to work for the
    > people who really need those letters: linguists,
    > speakers of African languages, and so on. The
    > only people who use the sideways "o" and the top-
    > and bottom-half "o" are Uralic linguists, and the
    > template works correctly for them, at least for
    > those letters.
    >
    > > > 5) if Mark wants to make a tailoring to interfile all these letters
    > >> (which can only result in what I describe as "visual seasickess" to
    > >> any poor users who have to actually read such wordlists.
    > >
    > >Again, no evidence.
    >
    > It was argued years ago in TC304 and WG20. I'm
    > disheartened to have to reopen the arguments now,
    > particularly as it affects stability and you
    > yourself have been a champion for stability.
    >
    > >Let's look at a particular example, letters based on
    > >"O". UCA *already* interleaves the list below (UCA O List). Adding John's
    > >list to that would add only the two elements:
    >
    > John's list?
    >
    > > > 6) the Latin alphabet has a lot more than 26 letters in it. In this
    > >> age of the Universal Character Set, "most users" would do better to
    > >> get used to this than to be hobbled by older concepts.
    > >
    > >I agree with the general principle, but it has
    > >no bearing on the topic at hand.
    >
    > It is the key to the principles which are in the template now.
    > --
    > Michael Everson * * Everson Typography * * http://www.evertype.com
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Jul 09 2004 - 19:35:12 CDT