RE: interleaved ordering (was RE: Phoenician)

From: Language Analysis Systems, Inc. Unicode list reader (
Date: Thu May 13 2004 - 10:41:14 CDT

  • Next message: Dean Snyder: "Archaic-Greek/Palaeo-Hebrew (was, interleaved ordering; was, Phoenician)"

    (John Cowan made this basic point, but I can't help making it more

    This whole discussion of interleaved sorting has veered off into the
    ditch. Now maybe I haven't been paying close enough attention (quite
    possible, as I pretty much lost patience with the whole Phoenician
    thread a LONG time ago), but I'm pretty sure the whole suggestion to
    interleave Phoenician and Hebrew in the sort order, with equivalent
    letters having the same primary weights, was originally floated as a way
    to bridge the gap between those who say their lives will be made easier
    by assigning a new set of codes to the Phoenician letters and those who
    say their lives will be made harder by doing this.

    If you've got a group of people who say "We don't want this script
    encoded" (call them A) and another group saying "We do want this script
    encoded" (call them B), you (generally) have to favor the user community
    that wants the new encoding (group B). After all, group A can simply
    choose to ignore the new code range and keep doing things the way they
    were doing them. The only way group A is hurt by the new encoding is if
    they have to deal with documents that use the new encoding (say, the
    rule to use the old encoding isn't universally followed by members of
    group A, or they occasionally have to work with documents produced by
    members of group B). If members of group A have to work with documents
    using both conventions, searching for a particular word won't
    necessarily work-- if you search using group A's convention, it won't
    find words encoded using group B's convention. The way around this
    problem is to use a tailored collation order that treats both
    conventions as equivalent. (Actually, you might also want to design
    Phoenician fonts whose CMAP tables map both the new Phoenician range and
    the existing Hebrew range to the same set of glyphs, so that both group
    A and group B can use the same fonts.)

    That's how we got here. The effect it has on sorted lists of words
    seems pretty uninteresting to me. I can think of two use cases:

    1. A sorted list of Phoenician words (or words using the Phoenicial
    script range, in whatever language or script) that mixes encoding
    conventions-- some words use the Phoenician script range and some use
    the existing Hebrew range. Same letters, same glyphs, different
    underlying encoding. You want to hide the difference in underlying
    encoding from the end user.

    2. A sorted list of Hebrew words, some in modern Hebrew script and some
    in Paleo-Hebrew (or some other script that uses the Phoenician range).
    Same language, different glyphs.

    Both are justification for an interleaved sort order, but really, how
    often will either use case come up? Do you really expect-- in EITHER
    case-- to have long lists of words that need to be mechanically sorted?
    Do you expect it to happen often enough that hacking together a Perl
    script to do it once isn't going to get the job done? Why is this a
    burning issue that has to be enshrined in the default UCA sort order?

    Of course, I could also ask the reverse question: Given that it's a very
    tiny community of users that's going to give a dang about the Phoenician
    characters in the first place, would it hurt anyone to put this in the
    default UCA ordering? [Not Is It The Right Thing To Do, which I've seen
    a lot of in this discussion, but Who Does It Hurt?]

    --Rich Gillam
      Language Analysis Systems, Inc.

    This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 10:43:16 CDT