RE: interleaved ordering (was RE: Phoenician)

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri May 14 2004 - 15:56:59 CDT

  • Next message: Patrick Andries: "Majority of community important, inclusion not forcing people to do anything (Re: [BULK] - Re: Interleaved collation of related scripts)"

    Dean,

    > >> > One normalization script could be used any number of times. Clip,
    > >> >normalize, sort - repeat as necessary.
    > >>
    > >> Multiply that times the number of independent researchers and separate
    > >> projects...
    > >
    > >... and you get a thousand different requirements, each of which
    > >should be addressed with appropriate levels of programming tools.
    >
    > ... that are solved now by a single default process requiring no end user
    > fiddling.

    No they are *not* "solved now by a single default process" -- you
    don't get a thousand different sort orders out of a single
    default process.

    > >What gives you the slightest hope that *every* researcher's
    > >particular needs for searching and sorting can be baked into
    > >some *default* collation element weighting table? The whole point
    > >is to provide a mechanism for people to *tailor* it as they choose
    > >to meet *different* requirements.
    >
    > No, that is not the whole point -

    Yes it *is* the whole point -- of the Unicode Collation Algorithm.
    Read the document. It is set up the way it is for a reason, and
    it is to provide a mechanism for people to *tailor* the default
    table to meet different requirements.

    > there is also the point that 90% of our
    > work, which is done now by simple, default processes, would, all of a
    > sudden, require custom tailoring.

    If sorting your data in binary order by code point is sufficient
    for your work -- since that is what the "simple, default processes"
    actually do -- then more power to you. Transliterate all your
    data into Hebrew, using Unicode or ISO 8859-8 or Windows CP 1255
    or MacHebrew -- it won't matter, since they all use the same
    alphabetic order for the 22 letters, anyway. Then sort binary
    and you're done.

    If you want to do anything *sophisticated* with your data, they
    you are going to get involved with normalization and custom
    tailoring of collations. You're also going to get involved with
    *other* kinds of manipulations of the data, including lemmatizing
    and transliterations, in order to get like to sort with like.

    > >Nobody plans to take away your rights and ability to continue
    > >doing what you now do, if it works very well for you. Please,
    > >sir, continue doing what you are doing with your current data. :-)
    >
    > It's incredible to me that you and others keep repeating this mantra,
    > ignoring the fact (repeated for the nth time) that we will all be forced,
    > in our separate research projects, to deal with MULTIPLE, COMPETING encodings.

    You will not be "forced" to do anything other than what you are
    doing currently. I keep repeating it because it apparently
    bears repeating.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri May 14 2004 - 15:58:06 CDT