From: Language Analysis Systems, Inc. Unicode list reader (Unicode-mail@las-inc.com)
Date: Thu May 13 2004 - 10:41:14 CDT
(John Cowan made this basic point, but I can't help making it more
forcefully.)
This whole discussion of interleaved sorting has veered off into the
ditch. Now maybe I haven't been paying close enough attention (quite
possible, as I pretty much lost patience with the whole Phoenician
thread a LONG time ago), but I'm pretty sure the whole suggestion to
interleave Phoenician and Hebrew in the sort order, with equivalent
letters having the same primary weights, was originally floated as a way
to bridge the gap between those who say their lives will be made easier
by assigning a new set of codes to the Phoenician letters and those who
say their lives will be made harder by doing this.
If you've got a group of people who say "We don't want this script
encoded" (call them A) and another group saying "We do want this script
encoded" (call them B), you (generally) have to favor the user community
that wants the new encoding (group B). After all, group A can simply
choose to ignore the new code range and keep doing things the way they
were doing them. The only way group A is hurt by the new encoding is if
they have to deal with documents that use the new encoding (say, the
rule to use the old encoding isn't universally followed by members of
group A, or they occasionally have to work with documents produced by
members of group B). If members of group A have to work with documents
using both conventions, searching for a particular word won't
necessarily work-- if you search using group A's convention, it won't
find words encoded using group B's convention. The way around this
problem is to use a tailored collation order that treats both
conventions as equivalent. (Actually, you might also want to design
Phoenician fonts whose CMAP tables map both the new Phoenician range and
the existing Hebrew range to the same set of glyphs, so that both group
A and group B can use the same fonts.)
That's how we got here. The effect it has on sorted lists of words
seems pretty uninteresting to me. I can think of two use cases:
1. A sorted list of Phoenician words (or words using the Phoenicial
script range, in whatever language or script) that mixes encoding
conventions-- some words use the Phoenician script range and some use
the existing Hebrew range. Same letters, same glyphs, different
underlying encoding. You want to hide the difference in underlying
encoding from the end user.
2. A sorted list of Hebrew words, some in modern Hebrew script and some
in Paleo-Hebrew (or some other script that uses the Phoenician range).
Same language, different glyphs.
Both are justification for an interleaved sort order, but really, how
often will either use case come up? Do you really expect-- in EITHER
case-- to have long lists of words that need to be mechanically sorted?
Do you expect it to happen often enough that hacking together a Perl
script to do it once isn't going to get the job done? Why is this a
burning issue that has to be enshrined in the default UCA sort order?
Of course, I could also ask the reverse question: Given that it's a very
tiny community of users that's going to give a dang about the Phoenician
characters in the first place, would it hurt anyone to put this in the
default UCA ordering? [Not Is It The Right Thing To Do, which I've seen
a lot of in this discussion, but Who Does It Hurt?]
--Rich Gillam
Language Analysis Systems, Inc.
This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 10:43:16 CDT