RE: interleaved ordering (was RE: Phoenician)

From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Thu May 13 2004 - 15:04:14 CDT

Next message: Peter Constable: "RE: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))"

Previous message: Chris Jacobs: "Re: Interleaved collation of related scripts"
Maybe in reply to: Kent Karlsson: "interleaved ordering (was RE: Phoenician)"
Next in thread: Dean Snyder: "RE: interleaved ordering (was RE: Phoenician)"
Reply: Dean Snyder: "RE: interleaved ordering (was RE: Phoenician)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Dean Snyder
> Sent: Thursday, May 13, 2004 10:36 AM

> Rich Gillam of Language Analysis Systems, Inc. Unicode list
> reader wrote
> at 11:41 AM on Thursday, May 13, 2004:

> ...
> >That's how we got here. The effect it has on sorted lists of words
> >seems pretty uninteresting to me. I can think of two use cases:
> >
> >1. A sorted list of Phoenician words (or words using the Phoenicial
> >script range, in whatever language or script) that mixes encoding
> >conventions-- some words use the Phoenician script range and some use
> >the existing Hebrew range. Same letters, same glyphs, different
> >underlying encoding. You want to hide the difference in underlying
> >encoding from the end user.
> >
> >2. A sorted list of Hebrew words, some in modern Hebrew
> script and some
> >in Paleo-Hebrew (or some other script that uses the
> Phoenician range).
> >Same language, different glyphs.
> >
> >Both are justification for an interleaved sort order,

No. Both are situations where the data should be normalized before
sorting. In the first case, convert the data into a single encoding
convention. In the second case, convert all the non-Hebrew data to Hebrew.
Then sort away.

> > but really, how
> >often will either use case come up?
>
> Well, for just one case, if you're a Dead Sea scroll scholar
> (one of the
> more populated sub-disciplines in Semitic scholarship) all
> the time and
> every day.

You create daily sorts on the same data? Since I doubt that you are
expecting new words to show up in there, I think that this must mean that
you are sorting different sets of the existing data, yes? For such a case,
just resort the prenormalized data.

> >Do you really expect-- in EITHER
> >case-- to have long lists of words that need to be
> mechanically sorted?
>
> Yes.

Normalization makes for faster sorting than interfiling.

> >Do you expect it to happen often enough that hacking together a Perl
> >script to do it once isn't going to get the job done?
>
> Yes.

One normalization script could be used any number of times. Clip,
normalize, sort - repeat as necessary.

> >Why is this a
> >burning issue that has to be enshrined in the default UCA sort order?
>
> [Or even a separate encoding for that matter?] Because of what lies
> behind the responses to your questions above.

I see no substance in your answers so far. Please clarify.

/|/|ike

Next message: Peter Constable: "RE: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))"
Previous message: Chris Jacobs: "Re: Interleaved collation of related scripts"
Maybe in reply to: Kent Karlsson: "interleaved ordering (was RE: Phoenician)"
Next in thread: Dean Snyder: "RE: interleaved ordering (was RE: Phoenician)"
Reply: Dean Snyder: "RE: interleaved ordering (was RE: Phoenician)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 15:05:25 CDT