RE: interleaved ordering (was RE: Phoenician)

From: jameskass@att.net
Date: Thu May 13 2004 - 17:09:52 CDT

  • Next message: Philippe Verdy: "Re: TR35"

    Dean A. Snyders asks,

    > Why make something we do all the time more difficult and non-standard,
    > when what we do now works very well?
    >

    Please, one thing to remember about default collation is that
    it's default. It's only there when no other instructions exist.

    Another thing to remember about collation is that it's best
    when tailorable.

    Anyone wishing to sort anything will want to impose their
    own rules on the sort, and anyone who has done this in the
    past has already worked out a method for such imposition.

    If you're making a library database, do you want "1984" to
    sort under the digit "1", would you prefer that it be sorted
    under "O" for "one", or would it be better if it sorted under
    "N" for "nineteen"? If the database is for biblios rather than
    books, you might prefer that the book title be sorted under
    "M".

    If someone keys in "nineteen eighty four" to a search box,
    and you want them to be able to find "1984" in your database,
    you will program for it.

    If you want "Richard III" to match with "Richard the third",
    a bit of extra work is required.

    If it's your purpose to set up a Hebrew script/Hebrew language
    database of Hebrew inscriptions, and the original script used
    in the inscription is irrelevant for your purposes, and you are
    importing data from multiple sources who may use alternate
    encodings, you will 'normalize' the data upon import. In this
    case 'normalize' would include converting the character set
    if necessary, transliterating/transcribing to Hebrew characters
    if necessary, stripping off points if they're present and not
    wanted, and so on.

    If you're importing data into a DSS Unicode database, and your
    source is using Web Hebrew or another ASCII-masquerade, then
    you're already performing normalization.

    If you're importing data originally entered in visual order rather
    than logical order, you're already normalizing.

    If your database includes a field to indicate the original script,
    here presuming that the original script is of some interest, and
    you want to export something, you'll either export it as Hebrew
    text, or you'll 'normalize' it back into the original script on export.

    Either way, it's about as hard to program for as allowing for
    differences in case, like "TROLL" vs. "troll". And, in either case,
    it should be done by the tools and trivial to the users, although
    any application which doesn't allow the user to set preferences
    and make rules in such an instance is next to worthless.

    Best regards,

    James Kass



    This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 17:26:25 CDT